Undergraduate Texts in Mathematics 
Kennan T. Smith 


Primer of 
Modern Analysis 


area o = f ME; y) dom = J Jol) dx 


Springer-Verlag New York « Berlin * Heidelberg » Tokyo 


ui A 


50 0 


EP 5 Re RRR RE 


Undergraduate Texts in Mathematics 


Editors 
F. W. Gehring 
P. R. Halmos 


Advisory Board 


C. DePrima 
I. Herstein 


Undergraduate Texts in Mathematics 


Apostol: Introduction to Analytic 
Number Theory. 
1976. xii, 338 pages. 24 illus. 


Armstrong: Basic Topology. 
1983. xii, 260 pages. 132 illus. 


Bak/Newman: Complex Analysis. 
1982. x, 224 pages. 69 illus. 


Banchoff/Wermer: Linear Algebra 
Through Geometry. 
1983. x, 257 pages. 81 illus. 


Childs: A Concrete Introduction to 
Higher Algebra. 
1979. xiv, 338 pages. 8 illus. 


Chung: Elementary Probability Theory 
with Stochastic Processes. 
1975. xvi, 325 pages. 36 illus. 


Croom: Basic Concepts of Algebraic 
Topology. 
1978. x, 177 pages. 46 illus. 


Fischer: Intermediate Real Analysis. 
1983. xiv, 770 pages. 100 illus. 


Fleming: Functions of Several Variables. 


Second edition. 
1977. xi, 411 pages. 96 illus. 


Foulds: Optimization Techniques: An 
Introduction. 
1981. xii, 502 pages. 72 illus. 


Franklin: Methods of Mathematical 
Economics. Linear and Nonlinear 
Programming. Fixed-Point Theorems. 
1980. x, 297 pages. 38 illus. 


Halmos: Finite-Dimensional Vector 
Spaces. Second edition. 
1974. viii, 200 pages. 


Halmos: Naive Set Theory. 
1974, vii, 104 pages. 


Iooss/Joseph: Elementary Stability and 
Bifurcation Theory. 
1980. xv, 286 pages. 47 illus. 


Kemeny/Snell: Finite Markov Chains. 
1976. ix, 224 pages. 11 illus. 


Lang: Undergraduate Analysis 
1983. xiii, 545 pages. 52 illus. 


Lax/Burstein/Lax: Calculus with 
Applications and Computing, 
Volume 1. 

1976. xi, 513 pages. 170 illus. 


LeCuyer: College Mathematics with 
A Programming Language. 
1978. xii, 420 pages. 144 illus. 


Macki/Strauss: Introduction to Optimal 


Control Theory. 
1981. xiii, 168 pages. 68 illus. 


continued after Index 


Kennan T. Smith 


Primer of 


Modern Analysis 


(Directions for Knowing All Dark Things, 
Rhind Papyrus, 1800 B.c.) 


5 


Springer-Verlag 
New York Berlin Heidelberg Tokyo 


Kennan T. Smith 
Mathematics Department 
Oregon State University 
Corvallis, Oregon 97331 


USAC 

Editorial Board 

P. R. Halmos F. W. Gehring 

Indiana University University of Michigan 
Department of Mathematics Department of Mathematics 
Bloomington, Indiana 47405 Ann Arbor, Michigan 48104 
US -AG Wes Ae 


——————— ee ee eee 
AMS Subject Classification: 26-01, 28-01 


Ss, 


Library of Congress Cataloging in Publication Data 
Smith, Kennan T., 1926— 

Primer of modern analysis. 

(Undergraduate texts in mathematics) 

Includes index. 

1. Mathematical analysis. I. Title. ITI. Series. 


QA300.877 1983 515 83-538 


The original version of this book was published by Bogden & Quigley, Inc., 
Publishers, in 1971. 


©1971 by Bogden & Quigley, Inc., Publishers. 
©1983 by Springer-Verlag New York Inc. 
All rights reserved. No part of this book may be translated or reproduced in any 


form without written permission from Springer-Verlag, 175 Fifth Avenue, New 
York, N.Y. 10010, U.S.A. 


Printed and bound by Halliday Lithograph, West Hanover, MA. 
Printed in the United States of America. 


Ones 1685 45 2 


ISBN 0-387-90797-1 Springer-Verlag New York Berlin Heidelberg Tokyo 
ISBN 3-540-90797-1 Springer-Verlag Berlin Heidelberg New York Tokyo 


To J. 


vu 


Preface 


This book discusses some of the first principles of modern analysis. It can be 
used for courses at several levels, depending upon the background and ability 
of the students. 

It was written on the premise that today’s good students have unexpected 
enthusiasm and nerve. When hard work is put to them, they work harder and 
ask for more. The honors course (at the University of Wisconsin) which 
inspired this book was, I think, more fun than the book itself. And better. 
But then there is acting in teaching, and a typewriter is a poor substitute for an 
audience. The spontaneous, creative disorder that characterizes an exciting 
course becomes silly in a book. To write, one must cut anddry. Yet, I hope 
enough of the spontaneity, enough of the spirit of that course, is left to enable 
those using the book to create exciting courses of their own. 

Exercises in this book are not designed for drill. They are designed to 
clarify the meanings of the theorems, to force an understanding of the proofs, 
and to call attention to points in a proof that might otherwise be overlooked. 
The exercises, therefore, are a real part of the theory, not a collection of side 
issues, and as such nearly all of them are to be done. Some drill is, of course, 
necessary, particularly in the calculation of integrals. 

Those using the book should not feel obliged to do every proof. It is more 
important for teachers to explain the theorems well and to show how they are 
used, and why they are interesting, than to spend all the time on proofs. This 
is one place where the teacher has an advantage over the author. He can 
choose proofs that seem to him exciting or illuminating, and skip some of the 
others. The author, however, must do nearly all. In this book I have omitted 
only the proof of Fubini’s theorem—in favor of a long list of applications. 

Many topics in the mathematics curriculum find their best use in the 
calculus of several variables: for example, much linear algebra, much topology, 
much measure theory, and so forth. Usually students learn them as separate 
topics. As a result, they understand these subjects narrowly and apply them 
poorly. I have therefore done quite a bit of linear algebra, topology, and mea- 


vite 


preface 


sure theory—but always with the applications in mind and following close 
behind. The result should be that students will understand both sides much 
better. 

Part I begins with a half intuitive—half rigorous discussion of applications, 
chosen to arouse interest and to show the need for a precise and general theory, 
and then develops this theory for functions of one variable. Unusual features 
include the solid treatment of Taylor’s formula, the discussion of real analytic 
functions, and the Weierstrass approximation theorem. 

In Part II the differential properties of functions of several variables are 
studied. ‘There is some background on metric and vector spaces, but the bulk 
of this part deals with applications of the implicit-function theorem to the study 
of surfaces and manifolds, tangent and normal planes, maximum and minimum 
problems in several variables and on manifolds, and so forth. Various interest- 
ing sidelights, such as the derivation of Kepler’s laws of planetary motion and 
mini-~max descriptions of eigenvalues, are included. 

In Part III the integration and differentiation of measures are studied. 
The Lebesgue theory of integration is developed in the simple, yet perfectly 
general, abstract setting of outer measures, and applied in many and diverse 
situations, such as integration in R”, summation of multiple power series, and 
Sard’s theorem on regular values of differentiable functions. The Lebesgue 
theory of differentiation is presented for regular Borel measures on R” and used, 
for example, in establishing the formulas for change of variable in multiple 
integrals. ‘The theory of differentiation leads naturally to the study of surface 
area via the area measures of Hausdorff. In the final chapter I discuss the 
Brouwer degree of maps of spheres and its applications, developing the degree 
from the analytic point of view suggested by John Milnor. 

Theorems, Definitions, etc., are numbered within each chapter and section. 
Thus, Theorem 6.3 of Chapter 8 is found in Section 6 of Chapter 8. Theorem 
6.3 without any chapter reference is found in Section 6 of the chapter in which 
the reference is made. The chapter number and title are printed in the upper 
left-hand corner of each double-pagce spread. 

The index lists most of the terms and symbols that are used and the page 
or pages on which they are defined. The symbols occur ahead of the terms 
beginning with the same letter. Thus, |A} and a» occur at the head of the a’s. 

I wish to thank my colleagues at Oregon State University and at the Uni- 
versity of Oregon who read and commented upon earlier versions of the manu- 
script. These include Professors P. M. Anselone, D. S. Carter, R. B. Guenther, 
B. Petersen, and, particularly, R. M. Koch. Professor Norton Starr of Amherst 
College also read an earlier version of the manuscript and made suggestions. 
In addition, I wish to thank Professor D. GC. Rung of The Pennsylvania State 
University for suggesting the title. Finally, I wish to praise Mr. Edward J. 
Quigley, who is a new publisher, but a good one. 


preface 1x 


It is fitting to end this preface with advice to the reader from the creator 
and patron saint of calculus. The following statement came in answer to the 
question of how he had made his famous discoverics: 


Isaac Newton 


“By always thinking about them, I keep the subject constantly before me and 
wait till the first dawnings open little by little into the full light.” 


PREFACE TO THE 
SPRINGER EDITION 


Rademacher’s theorem on the differentiability of Lipschitz functions has been 
added. Applications of Rademacher’s theorem and the Brouwer degree to 
changes of variable in multiple integrals have been added. The main addition, 
however, is a chapter on the results of Hestenes, Seeley, and Adams—Aronszajn— 
Smith on extension of differentiable functions of various kinds across Lipschitz 
graphs. A construction is given for a single extension operator which applies 
to functions of class C™, functions of class C™ with bounded derivatives, functions 
of class C™ with Hélder continuous derivatives, and to Sobolev functions. It 
applies to many other function classes as well, but these are the ones discussed 
explicitly. The discussion of the Sobolev spaces requires a minimal knowledge 
of I? spaces (mainly the Hélder and Minkowski inequalities). The theorems 
cover polyhedral domains, so they are of use in the numerical study of partial 
differential equations, as well as of theoretical interest. 


Kets: 


xt 


Contents 


Preface 


PART I 


CHAPTER 1 APPLICATIONS 


CHAPTER 2 CALCULATION OF DERIVATIVES 


Gi se oS 


SS Gr as CoS 


Tangent Lines 

Derivatives 

Maximum and Minimum Problems 
Velocity and Acceleration 

Area 


Limits 

Limits and Derivatives 

Derivatives of Sums, Products, and Quotients 
Continuity 

Trigonometric Functions 

Composite Functions 

Logarithms and Exponentials 


CHAPTER 3 DEEPER PROPERTIES OF 


ae an a 


CONTINUOUS FUNCTIONS 


Inverse Functions 

Uniform Continuity 
Maxima and Minima 

The Mean-Value Theorem 
Zero and Infinity 


vii 


xt contents 


CHAPTER 4 RIEMANN INTEGRATION 


— 


2 See SS) es Er ge ee ISS 


Area 

Integrals 

Elementary Functions 
Change of Variable 
Integration by Parts 
Riemann Sums 

Arc Length 

Polar Coordinates 
Volume 

Improper Integrals 


CHAPTER 5 TAYLOR’S FORMULA 


Ie 
a 
3: 


Taylor’s Formula 
Equivalent Formulas 
Local Maxima and Minima 


CHAPTER 6 SEQUENCES AND SERIES 


SN AMAONS 


Sequences and Series 

Increasing Sequences and Positive Series 
Cauchy Sequences 

Sequences of Functions 

Power Series 

Analytic Functions 

Examples 

Weierstrass Approximation Theorem 


PART II 


CHAPTER 7 METRIC SPACES 


— 


SS ea 


The space R” 

Absolute Value in R” 

Metric Spaces 

Function Spaces 

Equivalent Metrics 

Open and Closed Sets 

Connected Spaces 

Composite Functions and Subsequences 
Compact Spaces 

Equivalence of Absolute Values on R, 


50 


50 
53 
58 
59 
63 
65 
67 
vA 
74 
Te 


80 


80 
83 
86 


89 


89 
92 
94 
98 
103 
107 
113 
ry 


121 


123 


123 
Ar 
129 
130 
132 
134 
138 
143 
145 
150 


11. Products 
12. Stone—-Weierstrass Approximation Theorem 


CHAPTER 8 FUNCTIONS FROM R! TO R" 


1. Lines, Half-lines, and Directions 

2. Derivatives and Integrals 

3. Tangent Lines, Velocity, and Acceleration 
4. Geometric Models of R” 

5. Missiles, Moons, and so on 

6. Arc Length 


contents 


CHAPTER 9 ALGHBRA AND GHOMETRY IN R* 


Subspaces 

Bases 

Orthonormal Bases 

Linear Transformations 

Sums and Products 

Null Space and Range 
Matrices and Linear Equations 
Continuity of Linear Transformations 
Self-adjoint Transformations 
Orthogonal Transformations 
Determinants 


SN ee ee 


— 


CHAPTER 10 LINEAR APPROXIMATION 


Directional Derivatives and Partial Derivatives 
The Differential 

Existence of the Differential 

Composite Functions 

The Mean-Value Theorein 

A Fixed-Point Theorem 

The Inverse-Function Theorem 

The Implicit-Function Theorem 


Oo Na a 


CHAPTER 11 SURFACES 


Algebraic Curves 

Manifolds 

Tangent Spaces 

Functions on Manifolds 

Quadratic Forms and Quadric Surfaces 


AR eS 


x10 


151 
152 


158 


158 
161 
163 
166 
169 
174 


178 


178 
180 
186 
192 
196 
198 
202 
204 
208 
22 
216 


223 


223 
225 
228 
25 
234 
236 
257 
245 


249 


249 
255 
261 
267 
272 


x1U contents 


CHAPTER 12 HIGHER DERIVATIVES 


RON 


Second Derivatives 
Higher Derivatives 


The Inverse- and Implicit-Function Theorems 


Taylor’s Formula 
Local Maxima and Minima 


Part III 


CHAPTER 13 INTEGRATION 


on en ee ie ed 
SS le Se) Ses Re ge oe 


Introduction 

Lebesgue Measure 

Outer Measures 

Measurability in R®” 
Measurable Functions 
Definition of the Integral 
Convergence Theorems 
Integrable Functions 

Product Measures 

Functions Defined by Integrals 
Convolution 

Approximation Theorems 
Multiple Series 

Regular Values and Sard’s-Theorem 


CHAPTER 14 DIFFERENTIATION 


Ree eee a= 


Regular Borel Measures 
Differentiability Theorems 

Integration of Derivatives 

Change of Variable 

Differentiability of Lipschitz Functions 


CHAPTER 15 SURFACE AREA 


SPNAKAWN 


Area Measures 

Parametric Surfaces—Introductory Remarks 
The Jacobian 

Absolute Continuity 

Variation 

The Jacobian Formula for Surface Area 
Examples 

Polar Coordinates 


278 


278 
219 
282 
284 
286 


289 


291 


291 
294 
300 
305 
309 
312 
314 
317 
321 
328 
333 
336 
339 
341 


348 


348 
355 
360 
364 
368 


371 


371 
376 
378 
382 
384 
386 
389 
392 


contents 


CHAPTER 16 THE BROUWER DEGREE 


Oa ee 


Introduction 

The Degree for C” Functions 

The Degree for Continuous Functions 
Some Applications of the Degree 
Change of Variable Revisited 


CHAPTER 17 EXTENSIONS OF DIFFERENTIABLE 
FUNCTIONS 


Introduction 

Reflection Across Hyperplanes 
Regularized Distance 

Reflection Across Lipschitz Graphs 
Reflection of Hélder Functions 
Reflection of Sobolev Functions 
Extension from Lipschitz Graph Domains 


xv 


396 


396 
398 
403 
406 
411 


416 


416 
420 
a2t 
428 
432 
434 
437 


443 


PART I 


1 | Applications 


1 


DEFINITION 
iad 


TANGENT LINES 


The origin of calculus was the problem of finding the tangent toa curve. Like 
most geometric problems, this has an immediate appeal and is very tricky. 
What isacurve? What is the tangent line? From a straight geometrical point 
of view both questions are almost impossible. 

The thing to do with impossible questions is to avoid them. In the first 
place, we shall not consider an arbitrary curve but rather the graph of a function. 
In the second place, we shall not attempt a geometric definition of the tangent 
line but shall use geometric intuition to come to an analytic definition. This 
has several advantages. The analytic definition is fairly easy to give. The 
notion that emerges is relevant not only to the tangent line, but also to other 
problems where the tangent line has no role. Finally, in an analytic setting 
the power of arithmetic and algebra can be brought to bear. 

Let f be a real-valued function defined on an interval J, and let a be an 
interior point of J (i.e., not an end point). We ask for the tangent line to the 
graph of f at the point (a, 6), b = f(a) (Figure 1). 

A straightforward preliminary notion is that of a chord through the point 
(a, b). It is simply a line through this point and some other point (x, y) of the 
graph. Geometric intuition says that the tangent line should be the limit of 
the chord as the point x approaches the point a. 

The idea of the limit of a family of lines may seem as nebulous as that of 
the tangent line itself. The trick is to replace each line by a number and to 
deal with a limit of numbers instead. ‘The number to use gives the direction 
of the line. It is called the slope. 


Let L be the line passing through the two points (a, b) and (x,y), a # x. 
The slope of L is the number m = (y — b)/(x — a). 


1 applications 


Exercise 1 


Exercise 2 


Exercise 3 


Figure 1 


For the definition to make sense x must be different from a. What condition 
does this impose on the line L? 


Elementary trigonometry shows that the slope is the tangent of the counter- 
clockwise angle from the positive x axis to L. It is independent of the particular 
points (a, 6) and (x, y) chosen on L. 


Show that the slope is independent of the points (a, 6) and (x, y) chosen on L 
by using similar triangles. 


Find the equation of the line passing through the points (—1, 2) and (3, 6). 
(Hint: Calculate the slope in two ways—first by using the two points (—1, 2) 
and (3, 6), and then by using the two points (—1, 2) and (x, y).] 


If (a,°b) and (x, y) are points on the graph of f, then 6 = f(a) and y = f(x), 
so the slope of the chord joining them is 


y= _ fle) = fo, 


oe ee ee ah 


DEFINITION 
2.1 


Example 


derivatives 5 


According to our intuitive geometric reasoning, the slope of the tangent line to 
the graph of f at the point (a, 6) should be the limit of these numbers as x 
approaches a. 


DERIVATIVES 


The limit of the numbers 


f(x) — fla) 


ya @ 
as x approaches a is called the derivative of f at the point a. It is written f’(a). 
The result of Section 1 is that the tangent line to the graph of f at the 
point (a, 5) is the line passing through this point with slope f’(a). According to 
the definition of slope, the equation of the tangent line is therefore 


2 = #@) 

Of course, the definition of a limit of numbers is lacking. Intuitively, the 
limit of g(x) as x approaches a is the number J, if g(x) is as close as we please to 
! for every x that is close enough toa. To be useful in real proofs the definition 
must be given a precise quantitative form. Note that the distance between 
two numbers z and w is |z — w]. 


The limit of g(x) as x approaches a is the number 1, written limz 44 g(x) = 1, 
if for each positive number ¢€ there is a positive number 6 such that 
lg(x) — l| < € whenever |x — al < 6 and x # a, 


The definition would seem to fit the intuitive idea of limit, but its real 
significance must come out of the results that can be obtained from it. Before 
taking these up (in most of the rest of the book), let us look at some examples in 
which the value of the limit is pretty clear. 

First, let f(~) = x7. Then 


fe) —f@ _ 8-2 


a — a *z—@ 


=x+a. 
When «x is close to a, x + a is close toa + a, so the limit is 2a; that is, f’(a) = 2a. 
Find the tangent line to the curve y = x? at the point (2, 4). 


The slope is f’(2) = 4, and the equation is 
De 


= 


4 Oe i) a ep Ge 


6 


1 /applications 


THEOREM 
2.2 


Exercise 1 


THEOREM 
2.3 


Example 


Next let @r—~ =) Then 
fx) —f@) _ =a 


x —— a ne 


= x? + xa + a?. 
When x is close to a, x” is close to a? and xa is close toa?. Thus, the sum is 
close to) 3a and f(a) — 4a". 
Let f(x) = x", where n is any positive integer. Then 
LOO a et 


t= a Pea 


= xn tl + xr 2q + x7 3Q2 + So 5 + xq? + av, 


To see this call the right side R and consider (x — a)R = xR — aR. Each 
term in xR cancels with the previous one in aR, so all terms cancel except the 
first one in xR, which is x”, and the last one in aR, which is a”. It looks like this: 


xR = xn + x7Ilq + xn 292 + 6 0 0 + xign2 + asm, 

aR = xt—lq + xr 2q2 + S06 0 © G@ “ooo o + xar—} + a”. 
In the difference each term cancels with the one above or below it, leaving only 
n _ gn. 


The limit is a sum of n terms each of whichisa™“!. Therefore, f(a) = na”“}. 


x 


Tf f(x) = x", where n is a positive integer, then f’(x) = nx". 


When nz is a negative integer the same formula holds at any point x ~ 0. Try 
to fashion an intuitive proof based on the one above. Discuss also the case 
n= 0. 


The common functions occur as combinations, such as sums, products, 
and quotients, of a certain few functions, such as x”, sin x, cos x, a*, and log x. 
To calculate the derivative of any common function, what is needed is the 
calculation for each one of the few functions, and then some rules to deal with 
combinations. The special calculations, even more than the general rules, 
involve points of considerable interest and difficulty. They are carried out in 
Chapter 2, as are the proofs of the general rules. Here we shall state without 
proof one simple general rule that can be used in conjunction with Theorem 2.2 
and Exercise 1 to illustrate the developing theory. 


Tf f(x) = ag(x) + BA(x), where a and B are real numbers, then f'(x) = 
ag'(x) + Bh'(x). 


Let f(x) =<3x* — (8/x). Ti ex) = tem by Theorem 2.202'(x) — 25, ik 
h(x) = x71, then by Exercise 1, h’(x) = —x-*. Therefore, 


f'(*) = 3+ 2x + (—8)(—1)x-? = 6x + z 


3 


THEOREM 
3.1 


Proof 


Example 


maximum and minimum problems 7 


MAXIMUM AND MINIMUM PROBLEMS 


One of the intriguing applications of the derivative comes in finding the maxi- 
mum and minimum values of a function and the points where they occur. The 
geometric idea is that if f has a maximum or minimum at the point a, then the 
tangent line to the graph at the point (a, f(a)) should be horizontal. In other 
words, its slope is 0, or, in still other words, f’(a) = 0. 

This is apparent geometrically, but it can be looked at analytically, too. 
Suppose that f has a minimum at a. Then f(x) > f(a) for every point x, which 
means that the quotient 
_ 16) =f) 


~*~ — 2 


g(x) 


is >Oifx > aandis <Oifx <a. Let x approach a but be always >a. The 
limit f’(a) must be >0, since it is the limit of numbers that are all 20. Now 
let x approach a but be always <a. This time f(a) must be <0, since it is the 
limit of numbers that are all <0. Thus, f’(a) > 0 and f’(a) < 0, which leaves 
only f’(a) = 0. 


If f has a maximum or minimum at a and if f'(a) exists, then f’(a) = 0. 
Now let us give a real proof using the formal Definition 2.1 of limit. 


Suppose that fhasa minimum ata. We assume that f’(a) > 0 and derive 
acontradiction. [The contradiction is similar if we assume that f’(a) < 0.] 

In Definition 2.1 take « = $/’(a) and find the corresponding positive 
number 6 such that 


le(x) — f'(a)| < € = 3f’(a) if |x -— al <6 and x #a. 
Then 

a(x) > f(a) —4/f'@ > 0 if |x-—al<6 and x a, 
whereas we saw above that g(x) < Ofor every x < awhen/hasa minimum 


at a. 


A cylindrical barrel is to contain 1 ft? of whiskey. What should be the dimen- 
sions so that the barrel is built with the least amount of wood? 


If x is the radius of the barrel and / is the height, then the volume is 7xh, so 


1 
1 = rx*h and h=—- 
wx? 


8 


1 /applications 


Exercise 1 


Exercise 2 


4 


The amount of wood used is essentially the surface area of the barrel, which is 
the area of the top plus the area of the bottom plus the area of the cylindrical 
side. ‘Thus, 


Z 
area = wx? + wx? + 2axh = 20x? + -- 
x 
Therefore, the problem is to find the value of x at which the function 


f(x) = 2nx? + : 
is minimum. 
By Theorems 2.2 and 2.3 (the same calculation as in the last section) we 
have 
2 
f(x) = 44x — — 


aD 
If f has a minimum at a, then f(a) = 0. Hence, 27a* = 1, or 


it 
gS Qe Anal h = — = 248-18 = 2a, 
Ta 


The legitimate conclusion of this is that if the problem does have a solution, 
then the best barrel is the one whose height is twice its radius. But it is not at 
all clear that the problem does have a solution. Perhaps the function f does 
not have a minimum. (Note that it certainly does not have a maximum.) 
It could well be that there is no best barrel for a cubic foot of whiskey! 

To settle this kind of question (from an amoral point of view, of course) we 
shall have to prove a theorem to the effect that under the right conditions a 
function must have a minimum or a maximum. 


What are the right dimensions to make a rectangular field that contains 100 
yd? of grass using the least amount of fencing? 


What is the shortest distance from the point (18, 0) tothe curve y = x?? 
Where is the closest point on the curve? 


VELOCITY AND ACCELERATION 


A physical problem, apparently unrelated to the geometrical problem of tangent 
lines, is the motion of an object along a straight line. An example is a falling 
body. 


Let coordinates be chosen on the line, and let s(¢) be the coordinate of the 


Example 


velocity and acceleration 9 


object at the time ¢. In elementary physics the average velocity over the time 
interval from t = a to ¢ = x is (by definition) the difference between the final 
and initial positions divided by the length of the time interval. Thus, 


s(x) — s(a)_ 


Bs — @ 


average velocity = (1) 
It is plain then how to define the velocity at the time t = a. It is the limit of the 
average velocity as the time interval goes to 0. In other words, the velocity 
at the time ¢ = a is the derivative s’(a). 

In this context it is natural to consider the velocity function, the function v 
whose value at any time ¢ is the velocity at that time. We have 


Di) — sa) for each t. (2) 


In elementary physics the average acceleration of the object over the time 
interval from ¢ = a to t = x is (by definition) the difference between the final 
and initial velocities divided by the length of the time interval: 


v(x) — v(a) 


x —a 


average acceleration = (3) 
The acceleration at the time ¢ = a is the limit of the average acceleration as the 
time interval goes to 0. Thus, the acceleration at the time ¢ = a is the deriva- 
tive v’(a). 

The acceleration is the derivative of the derivative of s, which is called the 
second derivative of s and is written s”’. 

Again, it is natural to consider the acceleration function a whose value at 
any time ¢ is the acceleration at that time: 


AG = 2G) =A) for each t. (4) 
A stone is dropped from the top of a 100-ft tree. When does it hit the ground? 


What is known is the total force that acts on the stone. There is the force 
of gravity pulling the stone down and the air resistance pushing the stone up. 
Knowing these two forces, we must solve the problem. 

The basis for the solution is the law of physics, the famous second law of 
Newton, which states that the acceleration of an object is proportional to the 
force acting on it. In other words, there is a constant c such that if a(¢) is the 
acceleration at time ¢ and f(¢) is the force acting on the object at time ¢, then 


ai) = c(t) for each t. (5) 


(The constant ¢ is determined by the units in which the acceleration and force 
are measured.) 


Io 


1 /applications 


THEOREM 
4.1 


In our present case the air resistance is nearly negligible, and we shall 
neglect it. The force of gravity is nearly constant. (It depends on the distance 
between the stone and the center of the earth which varies only 100 ft during 
the fall.) We shall assume that it isconstant. ‘Therefore, according to formula 
(5), the acceleration is constant. ‘This constant, usually called g, has the value 
of about 32 ft/sec/sec. 

To proceed we have to choose the coordinates. The line of motion is the 
line from the top of the tree to the center of the earth. Let the coordinates on 
this line be such that the origin is at the surface of the earth and the positive 
direction on the line is upward. Let the time be measured from the moment 
the stone is dropped. ‘Then the information we have is that 


v(t) = a(t) = —32 and oO) — 0. 


The condition v(0) = 0 says that the stone has velocity 0 at the moment it is 
dropped. If it were thrown down with a speed vo, then the condition would be 
v(0) = —vo. The minus sign here and in the acceleration come from the fact 
that they are directed downward, while the positive direction on the line is 
upward. 

Now we see what the problem is: to determine the function v from its 
derivative and its value at one point. Once this is done the function s is to be 
determined from similar information. 

An obvious question occurs. ‘To what extent is a function determined by 
its derivative? The answer is as follows. 


Two functions have the same derivative at each point of an interval if and 
only if they differ by a constant. 


Part of the theorem is easy. If f = g — h, then by Theorem 2.3, f’ = 
g’ —h’. If f is constant, then f’ = 0, so g’ = h’. The other part is not 
so easy. It is proved in Section 10 of Chapter 2. 


Now let us return to the problem of the stone. We have v’(t) = —32, and 
we know from Theorem 2.2 that the derivative of —32t is also —32. According 
to the present theorem, we must have v(t) = —32¢-+ ¢ for some constant c. 


The value of ¢ is determined by the condition 0(0) = 0. Indeed, 0 = 0(0) = 
+32-0-+c. Therefore, ¢ = 0, and 


HO) = 0p. (6) 

Now for s. We have s’(t) = —32t, and we know from Theorem 2.2 that 

the derivative of —16#2 is also —32t, According to the present theorem we 

must have s(é) = —16t2-+ d for some constant d. This time s(0) = 100, so 
100 = s(0) = —16-0+d. Therefore, d = 100, and 


se) = — lor = Ou: a) 


aréa Il 


When does the stone hit the ground? At the time ¢, when s(é) = 0 we 
have 
—162 + 100 = 0 or — 


With what velocity does it hit the ground? With the velocity 


o($) = —328 = —80. 


AREA 


The problem is to find the area under the graph of a function. 

To be more precise, let f be a nonnegative function on an interval J, and 
let a and b be two points of J with a < b. We want to find the area of the set 
that is under the graph of f, above the x axis, and between the lines x = a and 
x = b—that is, of the set 


iGayica = 4 =<) and 0 y = (G)), 


Let {* f denote the area of this set. 

The number f° f is called the integral of f from a to b. The symbol { is 
designed to be a peculiar letter S, standing for sum. When the integral is 
defined properly (Chapter 3), it will appear as a limit of sums associated with 
the function f, and area will be only one of many interpretations that can be 
given to it. 

It is not clear that the problem of area makes sense. Rectangles, triangles, 
etc., have areas, but there is little reason to believe that such general sets do. 
One thing is clear, however: If the area does make sense, then it ought to 
satisfy the following two conditions: 


A. If m < f(x) < Mona <x < 3, then 


mb — a) < ff < MU — a). 


Byiea = b= «, then 
(MS ee 


The first condition says that if a set contains a rectangle, its area is larger 
than or equal to the area of the rectangle, whereas if it is contained in a rectangle, 
its area is smaller than or equal to the area of the rectangle. The second condi- 
tion says that if a set is cut into two parts by a vertical line, the area is the sum 
of the areas of the parts. 

In fact, it is not possible to define the area so that these two simple condi- 


ee 


1 /applications 


DEFINITION 
5.1 


Exercise 1 


Exercise 2 


THEOREM 
5.2 


Example 1 


tions hold unless some restriction is put on the function f. One natural restric- 
tion is that f be continuous at each point in the following sense: 


The function f on the interval I is continuous at the point a © I if lim 


f(x) = f(a). 


t— a 


There is a detailed discussion of continuity in Chapters 2 and 3. For the 
present it suffices to say that all the common functions are continuous except at 
certain quite obvious points. Forinstance, the function f(x) = 1/x is continuous 
at every point except 0—and there is no way to define it at 0 so that it becomes 
continuous there. The same is true of the function f(x) = sin(1/x). 


Draw graphs of the functions f(x) = 1/x and f(x) = sin(1/x). 
If f has a derivative at the point a, then f is continuous at a. 


(Don’t worry if there is difficulty with this one. The proof appears in 
Section 3 of the next chapter. The statement is given mainly to bear out the 
contention that the common functions really are continuous.) 

The theory of area works very well for continuous functions. 


Let f be continuous at each point of the interval I. 

(a) For any two points a and b of Tit is possible to define {° f so that 
conditions A and B above hold. 

(b) Let a be a fixed point of I, and for each x € I define F(x) = ff. 
Then F'(x) = f(x) for each x € I. 

(c) Let G satisfy G'(x) = f(x) for each x € I. Then for any two 
points a and b of J, [ey = G(b) — G(a). 


Part (c) of this astonishing theorem is what permits the calculation of 
integrals. 


Find the area under the curve y = x? between x = 0 and x = 2. 


According to the theorem, we should look for a function whose derivative 
is x*. One such is G(x) = x*/4, so 


area = G(2) — G(0) = 4. 


Part (a) of the theorem is not easy to prove. Parts (b) and (c) can be 
proved now, but part (a) is postponed to Chapter 4. [Logically, however, 
parts (b) and (c) do not make sense without part (a) to show that they do.] 


area he 


It is technically convenient to use the symbol ey alsowhena > 6. In 
this case it is defined to be —fs ie 


Exercise 3 Show that condition B holds, that is, that 


| aay iy 


no matter what the relative positions of the three points. 


Proof of Part (b) This is a situation that requires the quantitative definition of limit. It 
must be shown that for every positive number e there is a positive number 
6 such that 


fe me) 


: = 10)| <« if |x —b| <6 and x64, 
ao 


which is the same as 


F(x) — F(b) 


= 26, 


Sj7G) pe Wate — 5) =< 6 “and 7 pea 


According to the definition of F and condition B, 


= 6 = a z 
Fx) — FO) = fo e— fora [ort fra foe 
Therefore, formula (1) is the same as 
1 zx 
fb) —«<— ff< sO) +6 if |x —b| <6 and x¥b. (2) 
am b 

Let e > 0 be given. Use the fact that f is continuous at 5 to find 

5 > 0 such that |f(y) — f()| < ¢ if |y — 6| < 6, hence such that 


Oe <0) SO) eet |) 8 (3) 


This is the 6 required in formula (2). Indeed, let |x — 6] < Sand x > b. 
If y is in the interval between 6 and x, then |y — 6| < 4; so inequality (3) 
holds. Therefore, by condition A, 


(f(6) — x — 6) < fP F< FO +9@-9). 
Division by x — 6 gives inequality (2). 


Exercise 4 In the final paragraph it is assumed that x > 6. What happens when x < 6? 


Proof of Part (c) This one is easy now that part (b) is established. Indeed, G and F have 
the same derivative. According to Theorem 4.1, they must differ by a 


Us 


1 /applications 


Example 2 


Exercise 5 


Exercise 6 


constant; that is, G(x) = F(x) + a. In the difference G(b) — G(a) the 
constant a cancels out, so 


Gb) — Ga) = Fb) — Fea) = fp — for fp. 


The integral shows up in a great variety of mathematical and physical 
problems. 


A spring has a length of 1 ft when it is unstretched. Find the work done in 
stretching it to a length of 2 ft. 


First consider the physical background. Work is done when a force acts 
through a distance. When the force is constant, the work is by definition the 
product of the force and the distance. 

In the present case the force is not constant. A characteristic of springs is 
that the force is proportional to the amount of stretching. Let the spring be 
anchored at the origin and stretched along the x axis, and let f(x) be the force 
when the unanchored end is at the point x. The fact that the force is propor- 
tional to the amount of stretching means that there is a constant ¢ such that 
f(x) = c(x — 1). The constant ¢ is a quantity associated with the particular 
spring, which is determined by experiment. Let us suppose that ¢ = —1, so 


A) bs 1) 
Why is ¢ negative? 


In general, let W°(f) be the work done by the force f as it acts through the 
interval from a to 6. Conditions A and B are clearly satisfied by W. The 
first says that if the force is everywhere >m, then the work done is > that 
done by the constant force m, whereas if the force is everywhere <M, then the 
work done is < that done by the constant force M. The second says that if 6 
is between a and ¢c, then the work done over the interval from a to ¢ is the sum 
of the work done from a to 6 and the work done from 6 to ¢. 

Theorem 5.2 was proved solely on the basis of conditions Aand B. There- 
fore, if G’(x) = f(x), then 


wf) = G(8) — Gla) = fF. 


In our particular case, G(x) = —4$x? + x satisfies G’(x) = f(x), so the solu- 
tion of the problem is 


work = G(2) — G(1) = —#. 


Is this really the solution, or should the solution be +3? 


oS 


l 


Calculation 
of Derivatives 


LIMITS 


The statement lim; ,. g(x) = / has been discussed in two cases. In the first, 
the definition of the derivative, g is the quotient 


_ fe) = f@) 


ns —* @) 


g(x) 


where f is a function defined on some interval and a is an interior point of the 
interval (i.e., not an end point). In this case g is defined at every point suffi- 
ciently close to a, except for a itself. 

In the second case, the definition of continuity, g is a function defined on 
some interval J and a is a point of J, quite possibly an end point. 

In general g is a function defined on some set S, a is a point that may or 
may not belong to S, and /isa number. There are two ideas to be expressed. 
The first is that there are points of S' as close as we please to a. The second is 
that g(x) is as close as we please to / if x is in S and is close enough to a. 

These ideas are relevant in a wide range of situations. There is no reason 
that the set S on which g is defined must be a set of real numbers, or that the 
values of g must be real. What is necessary is that there be a distance, 
so thatit makes sense to say that x is close to a and that g(x) is close to /. 
For instance, either set could be the plane, or the three-dimensional space. 
This general point of view will be necessary in the end, but for the time 
being it will be simpler to stick to real-valued functions defined on sets of real 
numbers. 


16 


2/calculation of derivatives 


DEFINITION 
1.1 


THEOREM 
1.2 


Proof 


THEOREM 
1.3 


Let g be a real-valued function on a set S of real numbers. Let a and | be 
real numbers. The statement 


li ga) 
ze 
means that 


(a) For each positive number 6 there is at least one point x € S with 
lx — al <6. 
(b) For each positive number € there is a positive number 6 such that if 


|x — a| < 6 and x € S, then |g(x) — I| <e. 
Tf a limit exists, it ts unique. 


Suppose that 
lbusstres) = and lime) == 7. 


eeigs SoG 


rES zES 


Let € be a positive number. (We shall see how small to take it at the 
end.) Find 6; > 0 so that 


lex) —l)<e  if|k-—al <6; and xES, 
and find 62 > 0 so that 
gx) ~ml<e  if|x—al<& and «ES. 


If 6 is the smaller of the two numbers 6; and 69, then by condition (a) in 
the definition there is at least one point x € § with |x — a] <6. For 
this point x we have 


|? — mi S |? — g@)| + lg) — ml Se +e = 26. 
If 1 # m, then we can take e < 3|/ — m| and obtain the contradiction 


|i — m| < 2e < Jl — m|. 


It is not true, of course, that a limit always exists. Consider the function 1/x 
on the set S = {x:x #0}. Itisclear that for every interval J containing 0 this 
function is unbounded on J(\ S. On the other hand, we have the following 
theorem. 


If the limit 
lim g(x) 


res 


exists, then there ts an interval I with center a such that g is bounded on IC\ S. 


limits 1 


Proof Let / be the limit and find 6 > 0 corresponding toe = 1. If J is the 
interval {x:|x — a| < 6}, then 


le(x)| < le) —Y4+Ue<14+[ forx EINS. 


It is not true either that a limit always exists when the function g is 
bounded. Give a couple of examples. 


THEOREM Tf g(x) = 0 everywhere on S' and if 
1.4 
lim g(x) 
268 
exists, then 


lim g(x) => 0. 


xES 


Proof Suppose that the limit / is negative, lete = —/, and find the corresponding 
5. By condition (a) there is at least one point x € S with |x — al < 6. 
For this point we have 


Ba) St le) = — 
while by hypothesis g(x) > 0. 


Exercise 1 Suppose that 

lim g(x) 

ze8 
exists and that there is an interval J with center a such that a < g(x) < 6 for 
every wei. Phen 

a = lim gj 8. 
7E8 
There are some special cases that are particularly common and useful and 

have their own particular names. 


DEFINITION When S = I — {a}, where I is an interval and a is an interior point (1.¢., 
1.5 not an end point), we write 
lim g(x) for lim g(x). 


wa wa 


axa zEs 


When S = I — {a}, and ats the right end point of I, we write 


lim g(x), 


t<a 
and call the limit the left-hand limit. 


18 2/calculation of derivatives 


Exercise 2 


DEFINITION 
2.1 


Exercise 1 


Exercise 2 


When S = I — {a} and a is the left end point of I, we write 


lim g(x), 
>a 


and call the limit the right-hand limit. 


The limit 
lim g(x) 


xrya 


exists if and only if both the left- and right-hand limits exist and are equal. 
(Then, of course, they are equal to the limit.) 


LIMITS AND DERIVATIVES 


Let f be a real-valued function defined on an interval J and let a be an interior 
point of J. 


If the limit 


(x) — f(a) 
in —— 
za x—-@ 
wxa 


exists, then f is differentiable at the point a. The value of the limit 1s called 
the derivative of f at a, or f’(a). 


The left- and right-hand derivatives are defined in the same way with 
limit replaced by left- or right-hand limit. 
Of course, the derivative does not always exist. 


The derivative exists if and only if the left- and right-hand derivatives both 
exist and are equal. 


Let f(x) = |x|. The left- and right-hand derivatives exist at every point. 
They are equal at every point except 0 and are different at 0. 


Let us calculate the derivative of the function f(x) = x", where n is a 
positive integer. We have seen in Section 2 of Chapter 1 that 


ay f(a) = hl fb ytitg 4 yt8g2 4. be xg 2 4 gh (1) 


a5 —— 


THEOREM 
2.2 


Proof of the Theorem 


limits and derivatives 19 


so we have to calculate the limit of this sum. In Section 2 of Chapter 1 we 
reasoned that each term in the sum has the limit a*—! and that there are n such 
terms, so the limit should be na*“!. The justification of this reasoning calls 
for a theorem on limits of sums and products. 


Let 
lim g(x) = 1 and lim h(x) = m. 
28 2E8 


(a) ff =et A, then 
lim f(x) = 7-+ m. 


mia 


zES 
(b) If f = gh, then 
lim f(x) = Im. 


La 


zES 
(c) Iff = g/h, then 
lim f(x) = i/m provided m # 0. 


la 


zES 


[In these statements it is assumed tacitly that f, g, and A are all defined on the 
same set S. In part (c) this requires that A(x) ¥ 0 for all x © S. However, 
see the exercises. | 

The theorem is applied in the following way. Part (b) shows that the 
limit of each term in (1) is a*~!, and part (a) shows that the limit of the sum is 
na”—'. ‘This is not quite fair, since the theorem deals with the sum and product 
of two functions, while here there are sums and products of several. (The 
typical term in the sum is x*~*~!a*, which should be thought of as a product of 
n — 1 factors, k of them equal to a and n — k — 1 of them equal to x. Each 
factor obviously has the limit a, so there are n — 1 factors each with the limit a.) 
The case of several functions follows easily from the case of two, with the result 
that the limit of a sum ts the sum of the limits, and the limit of a product is the product 
of the limits, no matter how many functions are involved. 


The idea is always to estimate the quantity that must be proved to be small 
by means of those that are known to be small. 

First take part (a). The quantity that must be proved to be small is 
| f(x) — (1 + m)|, and those that are known to be small are |g(x) — /| and 
|h(x) — m|. In this case we have the estimate 


i Sree) lex) 2 Ax) = miX lee) = i 4: [a(e) = ml. 


20 


2/calculation of derivatives 


If € is a given positive number, then e/2 is also a positive number; so we 
can find a positive number 6 such that 


lg(x) — 1] <¢/2 and |h(x) — m| < ¢/2 
whenever |x — a| <6 and x ES. 
Then 


f(x) — U 4+ m)| < ¢/2+¢/2 = whenever |x — a] <6 and x€S. 


Strictly speaking, we should find first a 6, for g and a 62 for A, and 
then take 6 to be the minimum of 6, and 62. Usually some of these 
intermediate steps are skipped, and 6ischosen so as to satisfy several condi- 
tions simultaneously. 

Part (b) is more complicated. In this case the quantity that must 
be proved to be small is |f(x) — /m|, and those that are known to be 
small are again |g(x) — J| and |A(~) — m|. There is a trick that is almost 
always used with products, which is to add and subtract the same num- 
ber, in this case the number /h(x). We have 


| f(x) — dm| = |g@e)a(x) — thx) + Lh(x) — In| 
S |gx) — d| [ACx)| + [el AG) — a. 


The term || |h(x) — m| is not at all troublesome. [If |h(x) — m| is small, 
then so is || |A(x) — m|.] The term |g(x) — Z| |h(x)| could be. It is con- 
ceivable that although |g(x) — /| is small, |A(x)| is big, so that the product 
is not small. This is covered by Theorem 1.3. 

We proceed as follows. Lect « be a given positive number. First 
choose a positive number 69 and a positive number M so that |h(x)| < M, 
whenever |x — a] < 69 and x © S. Then e/(|/| + M) is also a positive 
number, so we can find a positive number 6; such that 


€ 
li] + M4 
whenever |x — a| < 6: and x€ES. 


lg(x) — I< ar and |h(x) — m| < 


If 6 is the minimum of 69 and 6, and if |x — al < 6 and x © S, then 


€ 


Ml eee 


€ 
(pe 
i <M 
In doing part (c) we can take account of part (b) and suppose that g 
is the constant 1, in which case we have 


1 om A(x) 


ts giana 
a m h(x) m  mA(x) 


limits and derivatives 21 


The point here is to make sure that the denominator of the fraction is not 
too small. We know that the numerator is small, but if the denominator 
were also small, then the fraction could be big. 

Since |m|/2 is a positive number, we can find 8p so that |h(x) — m| < 
|m|/2, whenever |x — a| < 6) andx © S. Then 


|A(x)| = |m — (m — A(x))| > |m| — || __ |m| 
2 2 
if |x al! < Oo) areca en 
therefore, 
il 2|A(x) — ml 
Tt i creas whenever |x — a| < 69 and x€S. 
m m 


Now let € be a given positive number. Then m?e/2 is also a positive 
number, and we can find 6; so that 


|h(x) — m| < me/2 whenever lc —al <6 and xES. 


Taking 6 to be the smaller of 69 and 6;, we have the inequality required. 
The three parts of the theorem are now proved. 


THEOREM Tf f(x) = x", where n is any integer, then f'(a) = na"—, provided a ¥ 0 
2.3 if n is negative. 
Proof The theorem is already proved if n is positive. It is obvious ifn = 0. 
Let n be negative, say n = —k. Then 


(OO iG) et! (5 3) = ( xk — “) 1 
pS pane 7 af vee 
We have seen already that the limit of the first factor is —ka*—! and by 


Theorem 2.2(c) that the limit of the second is 1/a%*. Therefore, the limit 
of the product is 


yas = yo Sp 


Exercise 3 If a is an interior point of an interval J, then 


lim f(x) = lim f(x). 


ta ia 


zElNs zEs 


(Thatis, if one of the two limits exists, then so does the other, and they are equal.) 


22 2/calculation of derivatives 


Exercise 4 


THEOREM 
3.1 


Proof 


THEOREM 
3.2 


Theorem 2.2(c) can be improved as follows. 


Under the hypotheses of Theorem 2.2(c) there is an interval J with center a such 
that A(x) ¥ 0 for allx © JS. In this case f is defined on 7) S and 
dim oe — 17 7 


wa 


zEINs 


DERIVATIVES OF SUMS, PRODUCTS, AND QUOTIENTS 


The theorems on the limits of sums, products, and quotients lead to theorems 
on the derivatives of sums, products, and quotients. One additional fact is 
needed first, however. 


If f ts differentiable at a, then 
lim f(x) = fla). 


ma 


=uya 


The number 1 is positive, so we can find a positive number 4o such that 


Eee) =e) || a whenever |x — a| < 69 and x #a. 
pe > (2h I 


Hence 


Lf) — fla)| < |x — olf’ @1 + 1) 


whenever |x — a] < 69 and x 4a. 


If € is any given positive number, choose 6, so that 


d(lf'(a)| + 1) <6; that is, 61 < Ve@lri 


If 6 is the smaller of 69 and 6,, then we have 

[f(x) — f(a)| <e¢ | whenever |x — al < 6. 
Let g and h be differentiable at a. Then g + h and gh are differentiable 
at a, and so is g/hif h(a) #0. Moreover, 


Qe) 2 
(b) (gh)! = gh + gh’, 
(c) (g/h)' = (gh — gh’)/h’, 


where all functions and derivatives are calculated at a. 


Proof 


COROLLARY 
io) 


Remark 


Exercise 1 


Exercise 2 


Exercise 3 


derivatives of sums, products and quotients 23 


Part (a) follows directly from Theorem 2.2(a). For part (b), if f = gA, 
then 


f(x) = fl@) _ gle)h&) — g@Ale) + g@Alx) — g@)h@ 


x — @ ema 2 


eS O) are) 


a 


Part (b) then follows from Theorems 2.2 and 3.1. 
In proving part (c) we can take account of part (b) and suppose that 
g is the constant 1, in which case the formula to be proved becomes 


YO ie (c’) 
til /eothen 


fe) — fla) _ 1/Alx) = 1/h(@) _ hla) — Ax) 1 


x—a x—a x—a Alhla) 


Formula (c’) follows from Theorems 2.2 and 3.1. 


Note that if A is the constant a, then f’(a) = 0, and part (b) of the 
theorem gives 


(ag)’ = ag’ if a is a constant. 
Combining this with part (a) of the theorem, we get 


If g and hare differentiable at a, and a and B are real numbers, then ag + Bh 
is differentiable at a, and 


(gee) — ag 8a. 
This was the assertion of Theorem 2.3, Chapter 1. 


Theorem 3.1 says that if f is differentiable at a, then f is continuous at a. 


Prove Theorem 2.3 (the derivative of x") by using Theorem 3.2 and mathe- 
matical induction. 


Calculate the derivative of 


For a function to be differentiable at a point a it must be defined at least on 
some interval with a as an interior point. Is this condition met in the case of 
the sum, product, and quotient in Theorem 3.2? 


24 2/calculation of derivatives 


4 CONTINUITY 


The purpose of this section is to list some of the simplest properties of continuity, 
those that follow directly from the theorems already proved about limits. Some 
of the deeper properties have very far-reaching consequences, which will be 
explained as they come up (for example in Chapter 3). 


DEFINITION A function f on a set S ts continuous at the point a © Sif 
4.1 
lima, (as) =" ca), 
ra 
xzESs 


The function f ts continuous on S, or simply continuous, if it 1s continuous 
at each pointa © S. 


This is the same definition that was given in Section 5 of Chapter 1, except 
that the earlier definition applied only to functions defined on an interval. 
If the definition of limit is written out in full, then Definition 4.1 becomes 


DEFINITION A function f on a set S is continuous at the point a © S tf for every positive 
4.2 number € there is a positive number 6 such that if |x — a| < 6andx CS, 
then 


If) —f@| <e 
In terms of continuity the assertion of Theorem 3.1 is as follows. 


THEOREM If f is differentiable at a, then f ts continuous at a. 
4.3 
Exercise 1 The hypothesis that f is differentiable at a implies that f is defined at least on 
some interval with center a. Apart from this requirement, the set S on which 
f is defined is immaterial. 


Exercise 2. Show that the function f(x) = |x| is continuous but not differentiable at 0. 


Exercise 3 Let f(x) = 0 if x is irrational and f(x) = 1/q if x = p/q, where p is an integer, 
g is a positive integer, and the two have no common factor. Show that f is 
continuous at each irrational point, discontinuous at each rational point, and 


differentiable nowhere. 


There are even examples of functions that are continuous at every point 
but differentiable at no point, but these are not easy to construct. 


THEOREM 
4.4 


Exercise 4 


Exercise 5 


DEFINITION 
5.1 


trigonometric functrons 25 


Theorem 2.2 on the limits of sums, products, and quotients gives the follow- 
ing theorem on the continuity of sums, products, and quotients. 


If f and g are continuous at a, then f + g and fg are continuous at a, and so 
iy) 2 7 ela) == 0, 


Here it is assumed that f and g are defined on the same set S, in which case 
the sum and product are defined on S and the quotient is defined on {x:x € S 
and! e(x) = 0}. 


Show that there is an interval J with center a such that the quotient is defined 
at all points of S'(\ J, the assumptions being those of the theorem. 


State and prove a theorem on the continuity of a composite function. The 
composite function is defined as follows. Let f be defined on S and g be defined 
on 7. ‘The composite function h is defined on the set {x:x © T and g(x) € S}. 
For any point x in this set, h(x) = f(g(x)). If 7 is an interval with center a 
and S is an interval with center ) = g(a), what can be said about the set on 
which / is defined? What are some simple choices for f and g if h(x) = V sin an 


(ES) == Sia Vx, or h(x) = (x — 1)” 


TRIGONOMETRIC FUNCTIONS 


In plane geometry an angle is an ordered pair of half-lines with a common 
initial point. The trigonometric functions are defined as follows. Translate 
and rotate the angle so that the initial point is at the origin of the coordinates 
of the plane and so that the first half-line coincides with the positive x axis. 
Then the sine of the angle is y, the cosine is x, the tangent is y/x, and so on, 
where (x, y) is the point where the second half-line meets the unit circle (the 
circle with center at the origin and radius 1). 

In calculus we do not deal with functions defined on the set of ordered 
pairs of half-lines, but rather with functions defined on sets of real numbers. 


Let @ be a real number. Let (x, y) be the point obtained by starting at the 
point (0, 1) and traveling counterclockwise along the unit circle a distance 0 
if @ > O (clockwise a distance —0 if 0 < 0) (Figure 1). Then 


sin @ = y, cos 6 = x, unOe = 
it 

i 1 5 

csc 9 = -) sec 2 = = cot@=-—- 
2 x y 


26 


2/calculation of derivatives 


Exercise 1 


Exercise 2 


Figure 1 


The sine and cosine are defined for all real 6, but the others are undefined at 
certain exceptional values of 6. What are the exceptional values in each case? 


Explain the relation between Definition 5.1 and the geometric definition, and 
prove the familiar “addition formulas” 


sin(@ + yv) = sin cos ¢ + cos @ sin ¢, 
cos(@ + vy) = cos 6 cos y — sin @sin ¢. 


(1) 


The calculation of the derivatives of the trigonometric functions is based 
on the addition formulas and on two inequalities: 


|sin 6| < |a| for every real 6, (2) 
6 < tan 6 for 00 0 gar) 2. (3) 


From a geometrical point of view these inequalities are easy to prove. Inequality 
(2) says simply that the perpendicular distance from the point P = (x, y) to the 
x axis is less than the distance along an arc of the unit circle. Inequality (3) 
says that the area of the sector S = ORP is less than thearea of the triangle 
T = ORQ. Tosee this, note that T is a right triangle with base 1 and height 
y/x. Therefore, 
Bae) 2 ee tang 
area 3 a : 
As for the sector, the ratio of the area of S to the area of the whole circle is equal 
to the ratio of the arc length @ to the arc length of the whole circle. In other 
words (since the radius of the circle is 1), 
areaS 96 


3 
T 20 


Remark 


trigonometric functions 27 


hence 


0 
S= 
area 2 


Thus, the inequality (3) does say exactly that area. S < area T (whichis obviously 
true since $C T). 


Whether this argument, or even Definition 5.1 itself, can be considered rigorous 
is a question that can be debated pretty hotly. On the one hand, they make use 
of the notions of arc length and area, which have not even been defined. These 
are rather complicated notions in which too-free use of intuition can lead quickly 
to trouble. On the other hand, we are not dealing here with the arc length of 
some complicated curve or the area of some complicated figure, but just with 
arcs and sectors of circles. In this case the geometric argument is very con- 
vincing, and, after all, the final test of rigor is whether the argument is really 
convincing. So the question is debatable. Presently, we shall be able to end 
the debate in either of two ways. One is to provide a sound general theory 
of arc length and area so that the above arguments no longer have to appeal to 
geometric intuition. The other is to take an entirely different point of view 
and to define the trigonometric functions by certain “infinite sums” instead of 
by Definition 5.1. Sucha definition appears more complicated in the beginning, 
but in the end it is much easier to work with. For those who want to, it is all 
right to take the results of this section as provisional until the time (Chapters 
4 and 6) when the more sophisticated methods are ready. 


Now let us calculate the derivatives of the sine and cosine on the basis of 
the addition formulas (1), the inequalities (2) and (3), and the identity 
sin? 6+ cos? @ = 1 for all real @, (4) 


which comes from the fact that the point (x, y), x = sin 6, y = cos @, is on the 
unit circle. Take any real number fh and put @ = g = h/2 in the addition 
formula for the cosine, and then use the identity (4). The result is 
cosh = ee — sin?— = 1 — 2sin?-; 
2 2 oe 
hence 


h 
1 — cosh = 2 sin? 5 for all real A. (5) 


Thus, the inequality (2) gives 

in*(h/2 h 

sin?(h/2) eae 6) 
h/2 2 


28 2/calculation of derivatives 


and in particular that 


lim ———— = 0. (7) 


Inequalities (2) and (3) together show that 


inh 
cosh < ea for |hA| < 5 (8) 


h = 


If this is combined with (7), it shows that 


lim = |, (9) 


Exercise 3. Why does inequality (8) hold for |A| < 7/2 and not just for 0 < hk < 4/2? 


Exercise 4 Write out the proofs of (7) and (9) with e’s and 6’s. 


THEOREM If f(x) = sin x, then f'(x) = cosx. If f(x) = cos x, then f'(x) = 
5.2 sil we 
Proof It is clear in general that 
fi es 
f(a) = lim fla a d fla) (10) 
h->0 


h#0 


which is a more convenient formula than the original for making use of 
the addition formulas. 


Exercise 5 Prove the obvious formula (10). 


In the case of the sine we have 


sin(a +h) — sina  sinacosh+cosasink — sina 


h h 
: cosh — 1 if sin A 

= sin a ————_ 
a ; cos a Fi 


? 


so the result follows from formulas (7) and (9) (and the theorem on the 
limit of a sum or product). 


Exercise 6 In the case of the cosine the proof is similar. Carry it out. 


Exercise 7 Express each of the trigonometric functions in terms of the sine and cosine, and 
calculate its derivative. 


Exercise 8 


6 


THEOREM 
6.1 


Example 1 


Example 2 


composite functions 29 


Find the maximum of the function f(x, y) = 3x — 2y on the unit circle. [Hint: 
(x, y) is on the unit circle if and only if x = cos 6 and y = sin 0 for some @.] 


COMPOSITE FUNCTIONS 


Another way of combining two functions is to follow one by the other. The 
result is called the composite function. The main theorem, called the chain rule, 
is as follows. 


(Chain Rule) If g is differentiable at a and f is differentiable at g(a) = b, 
then the composite function h(x) = f(g(x)) is differentiable at a, and 
h'(a) = f'(b)g’(a) = f’(g(a))g’(a). 
The function h(x) = (x — 1)* is the composite of f(y) = y? and g(x) = x — 1. 


Thereiore; 7 (@) =" 35"= 1) — 3a — 1)4 


Let g be the function whose graph is the top half of the unit circle [the circle 
with center (0, 0) and radius 1]. The equation of the unit circle is x2 + al 
so g satisfies the equation 


> 


ia) le 


Take the derivative on both sides, considering g(x)? as the composite of g with 
f(y) = y*. This gives 


2g(x)e’ (x) + 2x = 0. 
Therefore, since g(x) = i sae 


A remark is needed about the meaning of the theorem. In order that a 
function be differentiable at a point a, it is necessary that the function be defined 
on some interval with a inside—hence (with a smaller interval if necessary) on 
some interval with center a. Therefore, it is part of the hypothesis of the 
theorem that g is defined on an interval with center a and that f is defined on an 
interval with center g(a). It is part of the conclusion that the composite func- 
tion A is defined on an interval with center a. 

Let us examine this point. Since f is defined on an interval with center 
g(a), there is a positive number « such that f is defined on the interval J = 
{y:ly — g(a)| < e}. According to Theorem 3.1, there is a positive number 6 
such that g is defined on the interval J = {x:|x — a{ < 6}. Furthermore, 


30 


2/calculation of derivatives 
|e(x) — g(a)| < € whenever x € J. Then the composite function h is defined 
on J. 
Proof of the Theorem To see what to do we write a formula that is not quite correct. 
h(x) — h(a) _ f(g(x))-— flela)) _ fle) — f(e@) gl) — gta) 
ee eS e a(x) — g(a) x—a 
BO fy 20) ga), 
we 0 a 


where y = g(x) and b = g(a). As x approaches a, the limit of the second 
factor is g’(a). The limit of y is ’ (by Theorem 3.1), so the limit of the 
first factor is f’(b). Therefore, the limit of the product is f’(b)g’ (a). 

The argument is perfectly correct, except that there may be points x 
with g(x) = g(a), in which case it does not make sense. To avoid this 
trouble, define a new function f. 


f(g(x)) — flgla)) ene 


fix) = g(x) — g(a) 
f’ (g(a) if g(x) = g(a). 
With this function we have the formula 
h(x) — h(a) =D g(x) — g(a) 
x—a x—a 


for g(x) — g(a) cancels from the top and bottom as long as it is #0, and 
both sides of the formula are 0 if it is =0. Now the rule that the 
limit of a product is the product of the limits can be used, provided that 


lim f(x) = f’(g(@)), 


i—-a 


za 


the proof of which is a good exercise for the reader. 


Example 2 suggests that the chain rule can be used to calculate lots of 


derivatives. 
THEOREM Let g(x) = x", where n is any rational number. Then g'(x) = nx™~* for any 
6.2 x 0: 


This is the same formula that has been proved already when nz is an integer. 


Proof If n = m/k, where m and k are integers, then 


glx)? =x. 


Exercise 1 


7 


logarithms and exponentials ier 


Taking the derivative of both sides at the point x, we have 


kg(x)*1e/ (x) = mxm); 
therefore, 
il 
BE uy em eV 


be Ok 


I 


gi(x) = 


Find the flaw in the proof of Theorem 6.2 and in the discussion of Example 2. 


LOGARITHMS AND EXPONENTIALS 


We shall show that the function 
el 
eo) = i - for x 0 (1) 
1 * 


is the logarithm of x to a certain base and shall establish some of the properties 
of the logarithm function. The basic property of logarithms is that 


L(y) = L(x) + LG). (2) 
Let us prove it. 
Notice first that according to Theorem 5.2 of Chapter 1, 


L'(x) =- and - L(1) = 0. (3) 


This determines L uniquely, for if two functions have the same derivative, then 
they must differ by a constant. If two functions differ by a constant and take 
the same value at one point, then they must be identical. 

Let y be fixed and consider the two functions 


M(x) = Ly) and N(x) = L(x) + LG). 


Since L(y) is constant, we have 
1 


x 


Ga — a 

By the theorem on the derivative of a composite function, 
1 1 
Mii — 
xy x 


Thus, 4 and N have the same derivative and clearly take the same value at 
x = 1 [that is, L(y)]. Hence N = M, and formula (2) is proved. 


32 


2/calculation of derivatives 


THEOREM 
7.1 


Formula (2) implies that 
LG) = nL) (4) 


for every real a > 0 and every rational n. First take x = y = a in (2) to get 
L@) — 2h(@), Thentakex = a”, y = a,toget Lia*) = L(g’) - L@) — 3L (a), 
etc. This gives formula (4) if n is any positive integer. If n is a negative 
integer, write 1 = a"a~”, to get 


O= LO) = 2") -- Lia") = La") — nL (a), 


which gives formula (4) if 2 is any negative integer. 
Finally, let 2 be a rational number, say n = m/k, k > 0, and let a® = 
Then a” = b*, so 


PG) =U) ee 1 =O = 7 Ela) Si). 


This gives formula (4) when z is any rational number. 

What about the formula when 2 is an arbitrary real number? In this case 
the troublesome point is the meaning of a. For instance, how would you define 
27? The best thing to do is to take formula (4) as the definition! But for this 
we need a theorem. (How to prove such theorems is the subject of Chapter 3. 
For the moment we just assume it.) 


For every real number y there is one and only one positive real number x such 
that L(x) = y. Moreover, the inverse function E defined by 


E(y) = x if and only if L(x) = y (5) 
is differentiable at every point. 


With the aid of this theorem, a” can be defined to be the unique number x 
such that L(x) = nL\(a). 

Now, let us show that L is, in fact, a logarithm and that the inverse function 
E is an exponential. According to the theorem, there is a unique number e 
such that E(e) = 1. Then, according to the definition of e”, 


1G) SI) ee 
Hence, according to the definition of E as the inverse of L, 
1G) = 


Therefore, L, which is the inverse of E, is the logarithm to the base e. 
In calculus log x always means the logarithm. to the base e; that is, log x 
means L(x). (Some books use In x instead of log x.) 


Exercise 1 


Exercise 2 


Exercise 3 


Exercise 4 


Exercise 5 


Exercise 6 


logarithms and exponentials oe 
Use the equation L(E(x)) = x to show that E’(x) = E(x) = e*. 
Show that if A(x) = a”, then A’(x) = a? log a. 


It is natural to question whether our definition of a? is the right one. 
Exercise 2 shows that this definition does make a? continuous, and formula (4) 
shows that it does give the right value when x is rational. Therefore, Exercise 3 
shows that this definition is the only one that is reasonable. 


If two continuous functions agree on all the rational points, then they are 
identical. (Hint: For every real number x and every e > 0, there is a rational 
number n such that |x — n| < e.) 


Section 1 of Ghapter 3 deals with the way to prove theorems like Theorem 
7.1 and Theorem 6.2. In both cases there is a given function f. [Here it is 
the function f(x) = L(x); in Theorem 6.2 it is the function f(x) = x*.] The 
real problem is to know that for every point y there is some point x with f(x) = y. 
The fact that there is not more than one such x is immediate. Both functions 
have the property that if x1 < x2, then f(x1) < f(*2). There is also the problem 
of showing that the inverse function g, defined by 


gly) = x if and only if f(x) = y, 


is differentiable, but this problem is not so hard. 
Show that e%e” = e?+¥ and that (e7)¥ = e. 


Show that if f(x) = x”, x > 0, then f’(x) = nx"~!, where n is an arbitrary real 
number. 


The number ¢ is about 2.7. Prove that 2.5 < e < 3. 


oe 


i 


AXIOM 
1.1 


Exercise 1 


Deeper Properties of 
Continuous Functions 


INVERSE FUNCTIONS 


The problem that was left unresolved in the last two sections is one of existence: 
to show that for each positive real number y, there exists a positive real number 
x with x* = y; and to show that for each real number y, there exists a positive 
real number x with log x = y. Such problems cannot be touched without the 
aid of a fundamental property of the real numbers that has not appeared so 
far. There are many ways to state it, one of which is the following: 


Every nonempty bounded set of real numbers has a least upper bound. 


An upper bound of a set Sis a number 6 such that 6 > x for everyx € S. 
A least upper bound is one that is smaller than any other. Certainly, there 
cannot be more than one for each would have to be smaller than the other. 


The least upper bound 6 of a set S is characterized by the following 
conditions: 


Al b 2 & for every x = 'S. 
B. For every positive number ¢, there is a point x © S with x > b—e. 


The first condition says that b is an upper bound. ‘The second says that 
b — eis not, no matter how small e. 


The greatest lower bound of a set is defined similarly. It is a lower bound 
that is larger than any other. 


Every nonempty bounded set of real numbers has a greatest lower bound. 


THEOREM 
1.2 


Proof 


Exercise 2 


Exercise 3 


Exercise 4 


inverse functions 35 


The least upper bound of a set S is often called the supremum and is usually 
written sup S. The greatest lower bound is often called the znfimum and is usually 
written inf S. 

The basic theorem on the existence of a solution to an equation f(x) = y 
is as follows. 


Let f be a continuous function on an interval I, and let y be any number. If 
there exist points a and b in I with f(a) < y < f(b), then there exists a 
point x between a and b with f(x) = y. 


Suppose, to be definite, that a < b and let 
S == ixea = x = and 7 (x) =< 7}, 


S is nonempty, as a €S, and bounded, as $C [a, 6]. Therefore ¢c = 
sup S§ exists and lies in [a, 6]. It will be shown that y = /f(c). 

Given « > 0, choose 6 > 0 so that if |x —c| <8, then |f(x) —f(c)| 
<e. According to property B of least upper bounds, there is a point x ES 
with x > ¢— 6; according to property A, x <c. Consequently, y > / (x) 
>f (c) —e. Since this holds for all «, y = f(c), which requires ¢ < 6. 

Again, given e > 0, choose 6 as above and <6—c. For any 
point x satisfying ¢<x<c+ 6, »< f(x) < f(c) + «. Since this holds 
for all «,y< f(c). Consequently, » = f(c). 


The geometric version of this theorem sounds quite obvious. It says that 
if the graph is sometimes below a given horizontal line and sometimes above, 
then somewhere it must cross. The point that is found in the proof is the last 
point between a and b where it does cross. 

The idea of the last (or first) point at which a graph crosses a given line can 
be slippery. Take, for instance, 


1 
f(x) = xsin- oni. < I, 
x 


There is certainly no first point at which the graph crosses the x axis. 


Give an explanation to reconcile this example with the proof of the theorem. 


Give a proof of the theorem that produces the first point between a and b at 
which the graph crosses the given horizontal line. 


Use the theorem to prove the existence of solutions to the equations x* = y and 
log x = y. 


The characteristic property of an interval (indeed, the best way to define 
an interval!) is that if it contains two points, then it must contain every point 
between them. 


36 


3/deeper properties of continuous functions 


Exercise 5 


THEOREM 
1.3 


COROLLARY 
1.4 


Proof 


Exercise 6 


Show that this definition of interval gives exactly the usual several kinds of 
intervals. (You will have to use Axiom 1.1!) 


In terms of this characterization of intervals, Theorem 1.2 can be restated 
as follows. 


If f 1s continuous on an interval I, then the set 
FLD) = {y:y = f(x) for some x E I} 


7s an interval. 


If f 1s continuous and one to one on an interval I, then f is strictly increasing 
or strictly decreasing. 


To say that f is one to one means that f(a) #¥ f(b) for any two distinct 
points a and b. 


It is not hard to see that what must be ruled out is the possibility of three 
points a, b, and ¢ such that a < b < c, while 


f(a) < f(b) > fc) or f(a) > f(b) < f(@). (1) 


Prove rigorously that this is what must be ruled out. The first possibility is 
roughly that f increases for awhile, then decreases. The second is that f 
decreases for awhile, then increases. 


It is clear that Theorem 1.2 does rule out the possibilities envisioned in 
formula (1). Take the first, forinstance. If f(c) > f(a), the theorem says 
that there is a point x between a and 6 with f(x) = f(c), which is impos- 
sible if f is one to one. If f(c) < f(a), then the theorem says that there is a 
point x between 6 and ¢ with f(x) = f(a), which is again impossible if f is 
one to one. The second possibility is ruled out in the same way. 


Before going on, let us recall some general definitions. A function from a 
set X into a set Y is a rule that assigns a point of Y to each point of X. The 
point assigned by the rule f to a given x € X is called the value of f at x and is 
written f(x). The set of all values of f is called the range of f and is written f(X). 

The function f is one to one if it takes distinct values at distinct points; 
that is, f(x) ¥ f(x2) whenever x, ~ x2. If f is one to one, then a function g 
can be defined from f(X) into X as follows: If y € f(X), then by the definition 
of f(X) there is at least one x © X with f(x) = y; and by the fact that f is one to 


THEOREM 
1.5 


Proof 


THEOREM 
1.6 


Proof 


inverse functions 37 


one, there is only one such x. g is the rule that assigns this point x to the point y. 
In other words, g(y) = x if and only if f(x) = y. The function g defined in this 
way is called the inverse of f. It is a function from f(X) into X. 


Let f be continuous on the interval I. If f is one to one, then the inverse is 


continuous on f(I). 


Let 6 € f(J), and lete > Obe given. If b = f(a), then for 6 we can take 
the smaller of the two numbers |b — f(a — ¢)| and|b— f(a+e)|. Indeed, 
suppose that f is increasing. Then plainly the inverse, g, is, too; so if 
fla—©) <y<fla+e), then a—e< gly) Sate. 


The next question is the differentiability of the inverse function. 


Let f be continuous and one to one on the interval I. If f is differentiable at a 
and f'(a) # 0, then the inverse, g, is differentiable at b = f(a), and 


g’(b) = 1/f"(a). 


By the definition of the inverse we have y = fle(y)); so if g is differentiable 
at b, then Theorem 6.1 of Chapter 2 (chain rule) gives 1 = f’(a)g’(6). 


To show that g is differentiable at 6, note that 


f(g) — f(e(6)) g) — a) _ , 
g(y) — glo) v=o , 


so if x = g(y) and a = g(6), then 


gly) — g(b) xa 
yo f(x) — f(a) 


By the previous theorem, g is continuous at ); so if y — 6, it follows that 
x— a. Hence, the right-hand side approaches 1/f‘(a). 


Remark This is the same idea as the proof of Theorem 6.1, except that g is one to one; 


Exercise 7 


so the problem that arose there does not arise here—that is, if y # 6, then 
gly) # g(e). 


Show that every positive number has a unique positive kth root, and that 
the function g(x) = x1/* is continuous on 0 <x < © and differentiable on 


OR con 


38 


3/deeper properties of continuous functions 


Exercise 8 


DEFINITION 
2.1 


The arcsin is the inverse of the sine. More precisely, if —1 < x <1, then 
arcsin x = y if and only if —7/2 <y < 7/2 and sin y = x. Show that this 
makes sense, that the arcsin is differentiable, and that its derivative is (1 — x?)~1/2, 
Carry out a similar program for the arccos and the arctan. 


UNIFORM CONTINUITY 


Before turning to maxima and minima, which are also existence questions that 
are settled by the least-upper-bound axiom, we need a stronger version of 
continuity called uniform continuity. At present it may seem to be a technical 
notion, but in fact it is very important. 


A function f on a set S is uniformly continuous on S if for every positive 
number ¢ there is a positive number 6 such that 


lx) —f)| <8 ff xES,yES, and |x— 4] < 6. (1) 


At first glance it is hard to tell the difference from ordinary continuity. 
The difference is this: To say that f is continuous on S means that it is continuous 
at each point. This means that for each point x € S and each positive number 
¢ there is a positive number 6. In short, 6 is determined not by ¢ alone but by 
both e and x. In uniform continuity 6 is determined by ¢ alone. 

Consider, for example, 


i 
f@) =- on 0° x < 1. 
x 
If we fix a positive number 6 and take |x — y| = 6, we have 
ier yo ) 
7 — = a — > —: 
ite) son) = |*- 2) = |2= 2] > 2 


As x moves toward 0, | f(x) — f(y)| becomes as large as we please, no matter 
how small 6 is. So the function 1/x is not uniformly continuous on 0 < x < 1. 


THEOREM 
2.2 


Proof 


Exercise 1 


Exercise 2 


THEOREM 
2.3 


Proof 


uniform continuity 39 


It may appear that the fact that 1/x is unbounded has some bearing on the 
matter. To some extent this is true. 


A uniformly continuous function on a bounded set S must be bounded. 


Take any ¢€, say € = 1, and find the corresponding 6. ‘Take an interval J 


containing S and divide it in subintervals 4, . . . , J, of length less than 
6. If Z,(\ Sis not empty, choose some point x; in it. 
We shall show that if M is the maximum of the numbers | f(x1)|, . . . , 


[f(xn)|, then |f(x)| < 44+ 1 for every x CS. Indeed, every x ES 
belongs to some J;, hence satisfies |x — x,| < 6 for some &, and, therefore, 
satisfies | f(x) — f(xz)| < 1 for some k. 


Where did we use the fact that § is bounded? Show by example that this is 
necessary. 


The function sin 1/x on 0 < x < 1 is bounded but not uniformly continuous. 


In the examples above the set S' is an open interval—never a closed one. 
The good reason for this is the following basic theorem. 


Every continuous function on a closed bounded interval is uniformly continuous. 


We shall suppose that f is continuous but not uniformly continuous and 
derive a contradiction. If /f is not uniformly continuous, then for some e 
there is no 6. Thus, for every positive number 7 there is a pair of points 
x, and y, such that 


jx — yr] <r and |flxr) — fO)| 2 «. (2) 


(Otherwise r would be a 6.) The object is to find a point c with the 
following property: 


For each positive number 64 there is a positive number r 
such that r < 6 and |x, — c| < 6. (3) 


Let us assume for the moment that we have found such a point c. 
Since f is continuous at c, we can find 6 > 0 such that 


Ge wi(ol <tc 2 eh |x cl) <' 20, 
Take this 6 and taker as in (3). Then 


lyr — ¢| < |yr — x-| + lx — cl <r +6 < 26, 


40 


3/deeper properties of continuous functions 


so |f(yr) — f(o)| < «/2. But also |x, — cl < 8, so | f(x) — f(e)| < ¢/2. 
Therefore, | f(x) — f(yr)| < €, which is in contradiction with QQ) 

The construction that produces a point ¢ for which (3) holds is short 
but tricky. For each s > 0 let 


6, = sup {x,:0'< 7 < 5), (4) 
and then let 4 
a ommte..s > Oi: (5) 


If 6 > 0 is given, then by the definition of the greatest lower bound we 
can find sso thates <c¢-+ 6. (Otherwisec + 6 would be a lower bound.) 
Now, if ¢ < s, then plainly ¢; < c,, for ¢ is the least upper bound over a 
smaller set. Therefore, we can suppose that s < 6. By the definition of 
the least upper bound and the definition of ¢,, we can find r < s so that 
xX, > ¢;— 6. For this, we haver < 6, and 


C10 50, — 0 = x, 6, 


which shows that ¢ does have the property (3). 


Exercise 3 Since the theorem is false without them, the hypotheses that the interval J on 


Exercise 4 


Exercise 5 


which f is defined is bounded and closed ought to have been used somewhere in 
the proof. Where were they used? 


The construction of the point c is of general value. This point is called the 
limit superior of the x, asr— 0. 


Let g be a bounded function defined on a set S of real numbers, and let a be a 
point such that for each 6 > 0 there is at least one point x € S' with |x — al < 6. 
Define the limit superior of g(x) asx > aand x © S.  [Hint: In the above case 
g(r) = x, for r > 0, a = 0, and S is the set of positive real numbers.] Define 
also the limit inferior of g(x) as x > a and x € §. Show that 


lim g(x) 
ES 
exists if and only if 
lim sup g(x) = lim inf g(x), 
ma mI—7a 


zE8 zeES 


where the lim sup and the lim inf have the obvious meanings. 


Why is it required in Exercise 4 that for each 6 > 0 there is at least one point 
x € S with |x — al < 6 


Exercise 6 


3 


THEOREM 
3.1 


Proof 


THEOREM 
3.2 


maxima and minima 41 


If f is uniformly continuous on the open interval (a, 5), a and 6 finite, then the 
limits 


ta 
z>a a<b 


lim f(x) and lim f(x) 
2b 


both exist. Consequently, f(a) and f(b) can be defined so that / is continuous 
on the closed interval [a, 6]. 


MAXIMA AND MINIMA 


A continuous function on a closed bounded interval has a maximum and a 
minimum. 


According to Theorem 2.3, the function f is uniformly continuous, and 
then, according to Theorem 2.2, itis bounded. Let 


M = sup{ f(x):x € 7}. 


The problem is to show that there is a pointe € Iwithf(c) = M. Ifnot, 
then the function 


1 


g(x) = Ve 


is continuous on J, so by Theorems 2.2 and 2.3 it is bounded. This is 
absurd, for by the definition of the least upper bound we can find a point 
x ‘such that 


M2>f(x)>M—e hence g(x) > 1/e 


and this for any positive e. 


In Chapter 1 we had the following theorem. 


If f has a maximum or minimum at a point c, and if f is differentiable at c, 


then | (6) = 0: 


In practice the two theorems are used together to locate maxima and 
minima. ‘There is one point to watch for. In order to use Theorem 3.1 to 
guarantee the existence of a maximum or minimum, the interval J on which the 
function f is defined must be bounded and closed. That is, J must include its 
end points. On the other hand, a function is never differentiable at the end 


42 


3/ deeper properties of continuous functions 


Example 1 


Exercise 1 


Exercise 2 


Example 2 


points of the interval on which it is defined. Thus, end points must always be 
considered separately. 


Find the maximum and minimum of 
fi) = 3x 42V1 — en & 2 


Since f is continuous on the closed interval, the maximum and minimum 
exist. At every point x except the end points f is differentiable, and 


ff) =3 42-41 — x*)-¥/2(—2x) = 3 — 2x(1 — x2)-12; 


so f’(x) = 0 if and only if x = 3/13. By Theorem 3.2 this point and the 
two end points are the only possibilities for the maximum and minimum. To 
settle which is which we calculate the value of f at all three. We have 


3 — 
f0) =2 fl) =3 i(==) = V3, 
V'13 
Thus, the maximum occurs at 3/V13 and is equal to V 13, while the minimum 
occurs at 0 and is equal to 2. 


Justify the statements about continuity and differentiability that were made in 
the example above. 


Find the maximum and minimum of g(x, y) = 3x — 2y on the unit circle. 
[Hint: If (x, y) is on the unit circle, then x? + y? = 1, and we have: 

(a) First approach. Solve for y is a suitable way, and reduce the problem 
to Example 1. 

(b) Second approach. Do not solve for y explicitly, but use the formula for 
the derivative of a composite function. 

(c) Third approach. Use the fact that x = cos 6, y = sin 6 when (x, y) is 
on the unit circle. ] 


The equation of the Florida coast is y = x’. A swimmer is at the point (3, 0). 
How far is he from shore? 


The distance from the point (3, 0) to the point (x, y) is WV (x — 3)? + yy, 


so the function to be minimized is 


d(x) = V(x — 3)? + xt. 


maxima and minima 43 


| x 


Figure 1 


It simplifies matters to notice that the minimum of d and of f(x) = d(x)? must 
occur at the same point, for if d(x) > d(a), then d(x)? > d(a)?. Thus, the 
function to be minimized is 


Pe) = (= 3) abe, 
We have 
7 @) = 20 — 3) - 4x? = 220" x — 3) 
= 2(« — 1)(2x? + 2x + 3). 


Thuse.@))— O01 and only ut «= 1. 
This means that if the function does have a minimum, it must occur at 
x = 1, and the solution to the problem must be 


dQ) = V5. 


It can be shown that the function d does have a minimum in the following 
way. Choose any point (a, 6) on the curve y = x?, and let r be the distance 
from (3, 0) to (a, b) (Figure 1). The points on the curve outside the circle with 
center (3, 0) and radius r are clearly irrelevant. Their distance from (3, 0) is 
greater than that of (a,b). In other words, if |x — 3] > 1, then d(x) > d(a); 
so it is sufficient to minimize d on the interval 3 —r <x <3-+ 7r. Theorem 
3.1 applies to this problem. 


44 


3/ deeper properties of continuous functions 


4 


THEOREM 
4.1 


THEOREM 
4.2 


Proof 


THEOREM 
4.3 


Remark 


Proof 


THE MEAN-VALUE THEOREM 


The following theorem, called the mean-value theorem, is fundamental. 


Let f be continuous at every point of the closed interval I and differentiable at 
every interior point. If a and b are any two points of I, then there is a point 
& between them such that 


GG) = ay) = 2)f"(e): 


A more general version is useful too and is easier to see how to prove because of 
the symmetry of the statement. 


Let f and g be continuous at every point of the closed interval I and differentiable 
at every interior point. If a and b are any two points of I, then there is a 
point & between them such that 


(f(6) — fa))a"(&) = (e(b) ~ g(a))f'(). 
Theorem 4.1 is the special case in which g(x) = x. 


Let A(x) = (f(b) — fla))g(x) — (g(b) — g(a)) f(x) on the closed interval 
with end points a and 6. By Theorem 3.1, 2 has both a maximum and 
aminimum. If either of these is not an end point, then it is suitable as &, 
for the equation h’(£) = 0 is just what is to be proved. Inspection shows 
that A(b) = h(a), so the maximum and minimum cannot both occur at 
end points unless 4 is constant—in which case h’(¢) = 0 for every &. 


Let f be continuous at each point of an interval I. 

(a) If f(x) = 0 at all but a finite number of points, then f is constant. 

(b) If f(x) => 0 at all but a finite number of points, then f is increasing; 
that is, if a < b, then f(a) < f(d). 

(c) If f’(x) > 0 at all but a finite number of points, then f is strictly 
increasing; that is, if a < b, then f(a) < f(0). 


Here it does not matter whether Jis closed. It is taken closed in Theorems 4.1 
and 4.2 in order not to exclude the end points. We shall prove part (a) of 
Theorem 4.3 and leave the other two parts as exercises. 
The finite number of points, where f’(x) = 0 does not hold (and these 
may include points where f is not differentiable), divide J into a finite 
number of subintervals. Let a and b be any two points in one of these 


Exercise 1 


Exercise 2 


Exercise 3 


zero and infinity 45 


subintervals. Then the assumption of Theorem 4.1 is satisfied for the 
interval with end points a and 8, and the theorem gives 


f(b) — fla) = (b — a)f'(é) = 0. 
This shows that f is constant on each subinterval. But then it must be 


constant on the whole interval J, because it is continuous. 


Part (a) of this theorem is one of the keys to the calculation of integrals. 
Recall that if 


F(x) = ff, 
then F’(x) = f(x), at least if f is continuous. Part (a) of the theorem shows 


that if G is any function whatever satisfying G’(x) = f(x) at all but a finite 
number of points, then G and F differ by a constant; so 


b 
i f= HG) = FG) = cua) 
This was used in Section 5 of Chapter 1 and in Section 7 of Chapter 2. 


Let f’(a2) = O and f’’(a) > 0. Show that there is a positive number 6 such that 
f(x) 2 fla) for every x with |x — al < 6. (This means that f has a “local” 
minimum at the point a.) 


Define “local”? maximum. State and prove the corresponding theorem. 


Suppose that f and g are continuous on an interval J and that f’(x) < g’(x) for 
all but a finite number of points. If f(a) < g(a) for some point a, then f(x) < 
g(x) for all points x > a. Show that e* > x? + 1 for x > 0. 


ZERO AND INFINITY 


The limit of a quotient necessarily exists and is the quotient of the limits when- 
ever the limit of the denominator is #0. But the limit of the quotient may well 
exist even though the limit of the denominator is 0. One example that has 
been important already is the fact that 


. sink 
lim = 
z>0 x 
azx~0 


Indeed, every derivative is this sort of limit. 


46 


3/deeper properties of continuous functions 


Example 1 


Example 2 


Example 3 


Exercise 1 


THEOREM 
Soll 


There is a simple rule for the calculation of such limits, which says that 


fle) Seed 


—_ = 5) 
La (x) ta g(x) 
r~a rea 


when both f and g have limit 0 or © and the limit on the right exists. This 
rule is called l’Hospital’s rule (after the Marquis de l’Hospital, who is revered 
by students everywhere for having written the first calculus book). The same 
rule also holds for left- and right-hand limits. 


ee ea a 
lim = lim —— 
zt 1 =X 21 —1 2 
wx] type 1 
1 cos x sin x 1 
lim = lim =- 
z—0 x x20 2x 2 
«2x0 20 
’ log x 
lim x log x = lim 
z—0 z—0 Iie 
x>0 xz>0 
=i 
= jinn -= 0. 
230 —xX_ 
z>0 


Show that 
lim x? log x = 0 


x0 
z>0 


for every p > 0. 


The theorem is as follows: 
(VHospital’s Rule) Let f and g be differentiable on the open interval 


(a, b) with g(x) # 0 for all x. Let f and g both have limit 0 or both have 
limit + © as x approaches b from the left. If the limit 


exists, then so does the limit 


and the two are equal. 


DEFINITION 
5.2 


Exercise 2 


LEMMA 
5.3 


Proof 


Proof of Theorem 
ap l 


zero and infinity 4] 


The statement 

lim f(x) = © 

2b 
means that for every positive number M there 1s a positive number 6 such that if 
|x — b| < dand x < b, then f(x) > M. 


Give the definition of 


lim f(x) = — ©. 
z—b 


z<b 


The first step in the proof is to show that the function g’ cannot change 
It is always positive or always negative. If g’ is continuous, then this 


follows from Theorem 1.2, but it is not assumed that g’ is continuous. 


Let g’ exist and be ¥0 on (a, b). Then g’ does not change sign. 


We show that if g’(c) > 0 and g’(d) < 0, then there is a point £ between 
cand d with g’(£) = 0. Suppose thatc < d, and let & be a point at which 
gis maximum on [c,d]. All that is necessary is to show that £ cannot be 
either ¢c ord. Since g’(c) > 0, it follows that g(x) > g(c) for every x near 
c and to the right of c. Hence —§ #c. Since g’(d) < 0, it follows that 
g(x) > g(d) for every x near d and to the left of d. Hence ¢ ¥ d. 


Multiplying by —1 if necessary, we can suppose that g’(x) > 0 for each x. 
Let 
iv 
/=hm f 2) 


z—b &' (x) ; 
z<b 


and let e be a given positive number. Choose 6 > 0 so thatif |x — b| < 4 
and x < 8, then 


ey ence @ Sree ees) (x) re (xl) 


From the inequality (1) it follows that if¢ and dare any two points satisfying 
tO ad <7 then 


(@ — e)(g(@) — gle) < fd) — fl < C+ O0(@™—go)). (2) 


Indeed (for example), according to Theorem 4.3(b), the function 
(i + e)g — f is increasing, so 


@ + gd) — fid@) = @ + dg) — fC), 
which is the same as the right-hand inequality in (2). 


48 3/deeper properties of continuous functions 


At this point the cases where f and g have limit 0 and where f and g 
have limit © separate. Take first the case where f and g have limit 0. 
In this case fix ¢ = x and letd— 6. The result is 


— (1 —e)g(x) < —f(x) <-(U + g(x) for’ — 0 < % <6, 
and division by ~—g(x) gives 


gue 
j=e 
g(x) 


as required. [Note that g(x) < 0, since g is strictly increasing and has 
limit 0 at 6.] 

Now suppose that f and g have limit +. Of course, g must have 
limit + © because of the hypothesis that g’ > 0. Fix ¢ in the inequality 
(2), take d = x, and divide by g(x). The result is 


= #9) < 21a) _ fo (:- 9) 
¢ (1 iO) 2086) er ee 


Since the limit of g is © and ¢ is fixed, we can choose 6; < 6 so that if 
|x — b| < 6: and x < 3, then 


<lte forb -—-8 <x < 5, 


ge) and fe) 
g(x) g(x) 
Then we have 
d¢@-e1—.) - <i). <Ud+e1i+teo) +e for |x — b| < 6. 


(x) ~ 


This completes the proof. 


Exercise 3 The theorem remains correct when 


GN eae 


lim 
x g(x) 
z<b 


(Hint: The proof is almost correct, but the expressions / + € are nonsense.) 


DEFINITION The statement limz.,., f(x) = 1 means that for every positive number « there 
5.4 is a positive number r such that if x > r, then |f(x) — l| <«. 


Exercise 4 Give the definitions of 


lim) — 2, INiisat ACA) esi, ett. 


tL — 0 


and show that Theorem 5.1 remains correct in all cases. 


Exercise 5 
Exercise 6 


Example 4 


zero and infinity 49 
For every a = 1, > 0, lines, xa * — 0. 
Forevery f > 0) lim,5., (lop «/x?) = 0. 
The function f(x) = (1 + 1/x)? is strictly increasing and lim,_,,, f(x) = e. 


It is equivalent to show that the function g(y) = (1 + y)!/“is strictly decreas- 
ing on y > 0 and has limit e as y—> 0. The obvious idea is to consider 


A(y) = log g(y) = “8A +2). 


It is equivalent to show that A is strictly decreasing on y > 0 and has limit 1 as 
et: 

The fact that h has limit 1 is evident from l’Hospital’s rule, so what remains 
is to show that h’(y) < Ofory > 0. Now, 


yo (Fy) log ae) 
(1 + y)y? 


so it is enough to prove that k(y) = y — (1 + y) log(1 + y) < 0 for y > 0. 
Since £(0) = 0, it is enough to prove that k is decreasing, that is, that k’(y) is 
negative. But 


(iy) = 


HG) = log (hy); 
and this is negative because 1 + y > 1. 


5o 


4 : Riemann Integration 


1 


AREA 


It is time to reconsider the problem of area to see just what it means to say that 
the area under a curve exists. 

Let f be a nonnegative bounded function on the interval [a, 6] = {x:a < 
x <b}, and let A be the set under the curve y = f(x), 


Aya a5 = Pane 0 ay y)). 


The natural approach is to approximate the set A by a union of rectangles. 

The way to do this is to divide the interval [a, }] into small intervals, and on 

each small interval replace f by a constant. First we shall do this in such a way 

as to produce a set that contains A and, therefore, should have a larger area. 
Choose a finite sequence of points 


@= Km Koos & we, = b&b 
For each 7 let 
Me = sup f(x = 6 = ce 


and let R; be the rectangle with base [x;-1, x:] and height M;. It is clear that 
the set A is contained in the set 


SMG, 
i=] 
The picture looks as shown in Figure 1. 


The sequence (xo, x1, . . . , Xn) is called a partition of the interval [a, 6]. 
Of course, the numbers and sets defined above depend on the particular parti- 
tion. If p is a partition, R(p) will denote the corresponding set R. In this 
notation, the assertion of the last paragraph is that for every partition /, 


AC R(p). (1) 


area 51 


a=Xo 88a x4 x3 x4 =b 


Figure 1 


The area of R(p) is obviously the sum 


S(p) = > Milxi — xi-1)- (2) 


Consequently, if the area of the set A does make sense, then 
area A < S(p), 


which says that the area of A is a lower bound of all the numbers S(). It is, 
therefore, < their greatest lower bound. In other words, 


area A < § = inf{5(p):p is a partition of [a, b]}. (3) 


There is a similar construction to produce a union of rectangles which is 
contained in A. All that is necessary is to replace the least upper bound M; by 
the greatest lower bound 


Tee mt he) eee, 


If R; is the rectangle with base [x;1, x:] and height m,, and 
R(p) - Ke) Ri, 
i=l 
then clearly A D R(p), so that 


area A > §(p) = » mi(x; — x:-1)3 (4) 
fal 


52 


4/ Riemann integration 


DEFINITION 
1.1 


hence 
area A > §$ = sup{§(p):f is a partition of [a, b]}. 
The two inequalities (3) and (5) give 
S < area dA < JS. (6) 


If it happens that § = 5, then there is no doubt about what the area must 
be—it must be the common value of § and S. The idea is to start afresh and to 
use this as the definition of area. 


The area of the set A exists if S = S. If the area does exist, then it is the 
common value of § and S§. 


The program for what must be done is fairly clear. 

1. We must show that the definition makes sense in an analytical frame- 
work that is independent of the intuitive idea of area. 

2. We must show that the area does exist when the function f is reasonable— 
say continuous. 

3. We must show that the notion of area defined in this way has the 
properties that area ought to have. 

In connection with point 2, consider the function f on (0, 1], which is equal 
to 1 at each irrational point and to 0 at each rational point. Any subinterval 
contains both rational and irrational points, so each m; is equal to 0 and each 
M; is equal to 1. Hence, for any partition p, §(p) = 0 and S(p) = 1; con- 
sequently, § = 0 and § = 1. According to Definition 1.1, the area under the 
curve does not exist. ‘The present theory of area and integration is designed 
basically for continuous functions, or at least functions that are almost con- 
tinuous. In Chapter 13 there is a more general theory that does assign an area 
to this particular set. 

Although the collected works of Riemann fill only one volume, he was one 
of the most profound and original mathematicians of all time. 


(uy) 
Ds 


aD 
"wn 


oe 
Car 


Bernhard Riemann 


2 


Exercise 1 


DEFINITION 
2.1 


LEMMA 
2.2 


Proof 


LEMMA 
2.3 


integrals 53 


INTEGRALS 


The foregoing developments can be carried out in an analytical framework that 
has nothing to do with area. The expressions to consider are suggested by 
area, but they can be interpreted in many other interesting and useful ways. 

Let f be a bounded function on the interval [a, 6]. If p is a partition of 
[a, 6], the numbers M; and m; are defined as before, and the sums 5(p) and S(p) 
are defined by 


n 


S(p) = y M;(x;— x1), S(p) = > Milks — Xe-1). (1) 
= 


t=1 


The sets R(p) and R(p) could also be defined but would not be useful. Why? 
(Two reasons.) 


It is clear that $(p) < S(p) for a single partition p, but we shall have to 
find a new proof of the fact that $(g) < S(p) for any two partitions p and q. 
[The old proof depended on the interpretation of §(p) as the area of R(p), etc.] 
The new one is based on an investigation of what happens to §(p) and S(p) 
when points are added to the partition p. This investigation is important in 
other connections as well. 


The partition r is a refinement of p, written r < p, if every point in p is 
also in r. 


If r < p, then S(r) > $(p) and S(r) < S(p). 


It is enough to consider the case where 7 is obtained from p by adding just 
one point y. Suppose that y is between x;-1 and x;, and let 


M; = sup{f(x)ima Sx <y}, Mi! = sup{flx):y < x < xi}. 


The only difference between S(r) and S(p) is that the former contains 
Mily — xi-1) + Mz'(xi — y) in place of M;(x; — x1). It is obvious from 
the definitions that M; < M; and M)’ < My, so 


My ~ x1) + MY (xi — y) < Mily — xa) + Miles — y) = Mila — x4). 


This proves that 5(r) < S(p), and the other inequality is proved oes 
If p and q are any two partitions, then 


S(q) < Sp). 


If r is the common refinement of p and gq (i.e., the partition whose points 


Of 


4/ Riemann integration 


LEMMA 
2.4 


DEFINITION 
2.5 


THEOREM 
2.6 


Proof 


are those of p together with those of ¢), then Lemma 2.2 gives 


Sig) < SE < S@ < S&). (2) 


Having Lemma 2.3, we can proceed as before. Fix g and take the lower 


bound on p to obtain §(g) < S. Then take the upper bound on g to obtain 
§<S. This gives 


For any partition p, 


Sip) <S<S< Sp); hence O << S—§ < S(p) — S(p). 


The importance of the lemma is to show that in order to prove that § = Sit is 
sufficient to prove that for each positive number ¢ there is some partition # with 


S(p) — S(p) < ¢«. And Lemma 2.2 shows that the situation is improved by 
refinement. 


The function f is Riemann integrable on the interval [a, b}if $= S. If 
this is the case, the common value of § and S is called the integral of f and 


is written 
b b 
ip i or i f(x) dz. 


Let us begin by showing that every continuous function is Riemann 
integrable. 


If f is continuous on (a, 6), then f is Riemann integrable. 


According to Lemma 2.4, we have to show that if e > 0 is given, then we 
can produce some partition p such that S(p) — $(p) < «. We shall show 
that in fact this holds for every sufficiently fine partition, the fineness 
being measured by the number 


|p| = max x; — x1. 


Let € > 0 be given and use the uniform continuity of f to find 6 > 0 
such that | f(x) — f(y)| < « if |x — y| < 4, hence such that 


SG) pa So. (3) 
This inequality implies that 
M;— m; <e if |p| < é. (4) 


Indeed, if x and y are both in [xi-1, xi], then |x — y| < 4, so the inequality 
in (3) holds. First fix y and take the upper bound on x to obtain M; < 
fly) + 6 hence M; — € < f(y). Now take the lower bound on y to obtain 


THEOREM 
Pail 


Exercise 2 


THEOREM 
2.8 


Proof 


integrals 55 


M; — € S mi, which is just (4). The inequality (4) gives 


n 


Sip) — $—) =) (Mie = m)(ai = 2a) <6 Y ag tea = lb — 0). 


t=1 t=1 
[In order to come out with e, we would of course simply start with e/(b — a) 


instead of e.] 


Lf f and g are Riemann integrable on (a, b] and a is a real number, then of 
and f + g are integrable and 


frofaaf’r and [Pr+o= prt fie. 


Prove the theorem by showing that 


S(p; f) + Sp 2) < Sips f+) < Siosf+e2) < Sip; f) + Sp; g). 


If f 1s Riemann integrable on [a, b], then f is Riemann integrable on each 
subinterval. Conversely, let c be a point between a and b. If f is Riemann 
integrable on [a, c] and on [c, b], then f is Riemann integrable on (a, b] and 


[ra [or+ [Cr 


Letc bea point betweena and 6. If ’ isa partition of (a, c], p”’ a partition 
of [c, 6], and p the partition of [a, 6] obtained by taking the points of p’ 
together with those of p’’, then it is obvious that 


Sip) = Sie) + Sip’) = and —S(p) = S(p’) + S(p"). (5) 
Therefore, 
S(p) — S(p) = Sip’) — S(p") + Sp") — S(p"). (6) 


First suppose that f is integrable on [a,c] and on [¢, 6]. For any 
¢ > 0 we can choose p’ and ”’ so that both terms on the right of (6) are 
<e. Then S(p) — S(p) < 2e, and since ¢ is arbitrary, it follows that f is 
integrable on [a, b]. Prove the addition formula as an exercise by using 
formula (5). 

Next suppose that f is integrable on [a, b]. Choose a partition p of 
[a, b] such that S(p) — S(p) <. Since this is only improved by refine- 
ment, it can be assumed that ¢ is one of the points in pf, in which case p 
decomposes into a p’ and a p”. Then formula (6) shows that S(p’) — 
S(p') < €, which implies that f is integrable on [a, c], and also shows that 
5(p’) — S(p"’) < ¢, which implies that f is integrable on [, 4]. 


56 


4/ Riemann integration 


If [c, d] is any subinterval and f is integrable on [a, 6], then what has 
been proved shows first that f is integrable on [a, d], and then that f is 
integrable on [c, a]. 


THEOREM If f is Riemann integrable on [a, 6] and if m < f(x) < M, then 


2.9 
m(b — a) < [°f < M(b— a). 


Proof It is clear either from the definition or from Lemma 2.2 that for each 
partition p we have 
m(b — a) < S(p) < S(p) < Mb — a), (7) 


and, consequently, 


TO =) S51) SSSI) (8) 


So far the only integrable functions we know are the continuous functions. 
This can be improved easily to allow at least a finite number of discontinuities. 


THEOREM If f ts bounded on [a, 6] and continuous at all but a finite number of points, 
2.10 then f 1s Riemann integrable. 
Proof Because of Theorem 2.8, it can be supposed that f is continuous at all but 


one of the end points, say the end point a. Let € be a given positive 
number, let |/(x)| < Mf, and letc = a + €/M. 
For any partition p’ of [a, c] we have 


Sip!) — S(p’) < 2M(c — a) = 2. 


Since f is continuous on [c, 6], Theorem 2.6 shows that for any sufficiently 
fine partition p”’ of [c, ] we have 


S(p”) — S16") <e. 
If p is the partition of [a, 6] determined by p’ and p”, then formula (6) 
gives 5(p) — $(p) < 3e, which proves the theorem. 


Theorems 2.8 and 2.9 give the properties that were used in Section 5 of 
Chapter 1 to prove the fundamental theorem of calculus, which expresses the 
relation between the integral and the derivative: 


THEOREM (Fundamental Theorem of Calculus) If f is Riemann integrable on (a, b] and 


2.11 ae ot [or 


Exercise 3 


DEFINITION 
2.12 


THEOREM 
2.13 


Proof 


Exercise 4 


Exercise 5 


integrals 57 


then F is continuous on [a, 6] and F'(x) = f(x) at every point x where f ts 
continuous. 


Look back at the proof of Theorem 5.2 of Chapter 1, and then prove this 
theorem. 


Let f be a function on an interval I. A primitive of f is a function G that ts 
continuous on I and satisfies G’(x) = f(x) at all but a finite number of points. 


Let f be bounded on [a, 6] and continuous at all but a finite number of points. 
(a) The function 


F(x) = fF 
1s a primitive of f. 
(b) If G is any primitive of f, then 


| f= Gb) = Ga). 


Part (a) comesfrom Theorem 2.11. Part (b) comes from this and Theorem 
4.3 of Chapter 3, which gives a number a such that G(x) = F(x) + a, and 
hence shows that G(b) — G(a) = F(b) — Fla). 


The expression G(b) — G(a) recurs so frequently in the calculation 
of integrals that it is convenient to have a special symbol for it. ‘The one 
commonly used is 


b 
G(x) le 
With this notation we would write, for example, 
8 x3 1 2 
x dx = — =9J——= 8-- 
a Sea 3 5 


Sometimes it is convenient to consider the integral ie when b < a. 


In this case it is defined by 
b a 
ee Se 


Show that le: + [Gy = ic no matter what the relative positions of a, 6, and c. 
Show that if two of the integrals exist, so does the third. 


If F(x) = [2 f, what is F’(x)? 


58 


4/ Riemann integration 


Exercise 6 


3 


If F(x) = eas what is F’(x)? [Hint: First set Gly) = ee and note that the 
derivative of G(x?) can be obtained from the chain rule.] 


ELEMENTARY FUNCTIONS 


The usual way to calculate an integral ie f(x) dx is to find a primitive F and use 


Theorem 2.13, which says that 


[/f@ ax = F) — FO), 


although there are occasional interesting cases where the integral can be calcu- 
lated for some particular a and 5, while the primitive cannot. The primitives 
of the elementary functions are as follows, with the symbol {f(x) dx standing 


for some primitive. 


[ow 

f- 
—dx 
x 


sin x dx 
cos x dx 


tan x dx 


antl 
ar forn # —1. (a) 
log |x|. (b) 
—cos x. (c) 
SInex. (d) 
—log|cos x}. (e) 
log|sin x|. (f) 
log|sec x + tan x]. (g) 
—log|csc x + cot x|. (h) 
e*. (i) 
x log|x| — x. G) 


The formulas are simply verified by differentiation—with a couple of 
precautions. The assertion is that the formulas hold on any interval where 


they make sense, for primitives are only defined on intervals. For example, 
it is not claimed that when n is negative, formula (a) holds on the whole line. 


change of variable 459 


It holds on the interval x > 0 and on the interval x < 0, but not on any interval 
containing 0. 

The derivative of log|x| has been discussed at points x > 0 (where log|x| = 
log x), but not at points x < 0. It must be shown that it is 1/x for x < Oas 
well. This can be done immediately by the composite function formula. 

To provide one example of the way the calculations go, consider formula 
(e). The function h(x) = —log|cos x| is the composite of f(y) = —log|y| and 
g(x) = cosx. Now f(y) = —1/y and g’(x) = —sin x; therefore, 


sinx sin x 


A(x) = f'y)g'(x) = ne = = stan 


COs x 


The calculation of derivatives is an orderly business. There are simple 
formulas for the derivatives of the elementary functions and for all their usual 
combinations. The combinations may be complicated, but patience and care 
do the job. 

Calculation of integrals is more primitive. Patience and care can well go 
unrewarded while the prizes go to experience and cunning. The reason is that, 
although there are formulas for the primitives of the elementary functions, there 
are no formulas for combinations, except for sums and constant multiples 
(Theorem 2.7). 

There are some general formulas, particularly for transforming one integral 
into another. The procedure is to use these to transform this way and that, 
until finally something recognizable is reached. Cunning in making the trans- 
formations and experience in recognizing the results are the important factors. 
These general formulas and some examples are given in the next sections. 


CHANGE OF VARIABLE 


The most effective tool for the calculation of an integral 


i : f(x) dx 
is a suitable change of variable, 
ol): 


The formula for making this change is nothing more than an integrated version 
of the chain rule. If F is a primitive of f and G(t) = F((é)), the chain rule 
gives 


G'(t) = F'(e())¢'(0) = fed) e's (1) 


60 


4/ Riemann integration 


so if a = g(a) and 6 = ¢(8), then Theorem 2.13 gives 
b 8 
[P1@ ax = FQ) — F@) = GB) — Ga) = fF KeO)e' Oa. 2) 
THEOREM The formula for making the change of variable x = g(t) is 


4.1 
[Pre Ge = IMICOrAO dt, where a = g(a) and b = ¢(8). (3) 


It holds under the following conditions on f and ¢: 

(a) y is continuous at every point of [a, 8) and is differentiable at all but 
a finite number of points. 

(b) f ts continuous on o([a, 8). 

(c) f(e(t))¢' (2) is continuous at all but a finite number of points of 
[a, 8] and is bounded. 


Ordinarily the symbol [e, 8] is used only when a < 8. Here it does not 
matter—[a, 8] designates the closed interval with end points a and 8, whatever 
their relative positions. 


Proof What we have to do is justify the steps that were sketched in the opening 
paragraph. There is just one minor problem, which will be visible in a 
moment. 


From the results of Chapter 3 it is known that g([a, 8]) is an interval. 
In fact, if m and M are the minimum and maximum of ¢, then ¢([a, 8]) is 
the interval [m, M]. The chain rule asserts that formula (1) holds at any 
point ¢ such that ¢ is differentiable at ¢ and F is differentiable at ¢(t). 
The minor problem is that there may be infinitely many points ¢ with 
g(t) = mor g(t) = M. At these points the formula does not hold because 
of the technicality that F is not differentiable at the end points of the 
interval on which it is defined. The first step is to take care of these. 


LEMMA Let f be continuous on [m, M], and define 


e f(x) jor mt se AE, 
fo fm forx< m, 
f(M) — forx > M. 


Then f ts continuous at every real x. 
Exercise 1 Prove the lemma and draw a picture. 


The proof is achieved as follows. Choose any point ¢ and set 


(OG) ie 


Example 1 


Exercise 2 


Exercise 3 


change of variable 61 


By Theorems 2.6 and 2.11, F(x) is defined for every real x and satisfies 
F(x) = f(x) for every real x. Here we do not have to exclude any end 
points. Set G(é) = F(y(t)) for a <t <8. By the chain rule we have 


G'(t) = fle) eA) 


at every point where ¢ is differentiable, that is, at all but a finite number 
of points. Hence Theorem 2.13 gives 


b =~ ~ B 
[P #@) de = FO) — F@ = 68) - C@ = [? (o)o'@ a. 
The best way to remember the change-of-variable formula is as follows: 


1. Replace x by v(t) and dx by y'(t) dt. 
2. Replace the limits a and b by points a and 8 such that a = g(a) and b = ¢(8). 


This rule is one of the main reasons for writing fe f(x) dx instead of ee It is 
easier to remember to replace x by ¢(¢) and dx by ¢’(#) dt than it is to remember 
to replace f by the function g defined by g(t) = f(v(é)) y(t). There are other 
reasons too, but for the present the dx can be considered simply a mnemonic 
device with no meaning whatever. It should be emphasized that 


iy (x) dx, , ; fly) &, a ’ flu) du, 


and so on, all have exactly the same meaning. 


Calculate fx V1 — x? dx. 
Make the change of variable 
—— (1 = yee hence ax = —#(1 == pe dt. 
Then 


fpxVi- wea 


0 1 
-3/ es el +f, edt 
1 


= Sey |) eS Bl 
= sl = 5. 
3) 0 Si) 


This example shows that we would not want to simplify condition (c) of Theorem 
4.1 to read that ¢’ itself is bounded. Discuss. 


The problem in making a change of variable is to discover the right one. 
Discover the right one in the example above by working backward, that is, by 
putting ¢ = 1 — x?, and dt = —2x dx. Discuss why this works. 


62 4/ Riemann integration 


Example 2 Calculate i sin” x cos" x dx, where m and n are nonnegative integers. 
First suppose that m is odd, say m = 2k + 1. Then 


f sin” x cos x dx = f sin?* x cos" x sin x dx 
= J (1 — cos? x)* cos" x sin x dx. 


When (1 — cos? x)* is multiplied out, the integral becomes a sum of terms that 
look like this: 


f cos" x sin x dx. 


Now it is plain what to do. Put ¢ = cos x, in which case df = —sin x dx; so 


the integral is 
eee cos’*! x 
— | ed = ——— = — ——- 
acta Se ll 
If m is even but n is odd, the procedure is entirely similar. 


Now, suppose that both m and n are even, say m = 2k and n= 2l. In 
this case use the identities 


1 + cos 2x : 1 — cos 2x 
cos Se = ——__—« 


2 3 


The integral becomes 


| (’ — cos = ¢ + cos =) 
eee |) | ee 
2 2 


which is a sum of terms that look like this: 


m+n 
2 


1 
[cow ax ax = 5 ff costed withr < A+] = 


This is the same kind of problem as the original problem, but with an exponent 
r that is at most half the sum of the original exponents. When 7 is odd, the 
first part of the discussion settles the matter. When 7 is even, a repetition of 
the same procedure cuts the exponent in half again. 

When m or nis large (and both are even), the calculation is impossibly long, 
but a finite number of steps does reduce the exponent to 0 or to an odd number. 
Explicit formulas can be given, but they do not seem particularly relevant at 
this stage. 


Exercise 4 Calculate f sin’ x cos? x dx. 


Exercise 5 Calculate f sin? x V cos x dx, and state a general theorem about this situation. 


Example 3 


Exercise 6 


5 


integration by parts 63 


(Pv eS CES 


This is the integral for the area of a semicircle. 

When the integrand (function to be integrated) contains a sum or difference 
of squares, it is often profitable to make a trigonometric change of variable. 
The idea is to draw a right triangle in which the squared quantities appear on 
two of the sides. In this case r goes on the hypotenuse and x on either of the 
other sides. (If the integrand contained x? — r?, then x would go on the 
hypotenuse and r on either of the other sides, while if it contained x? + 1?, both 
would go on the other sides.) 

In the present case it looks as shown in Figure 2, so that x/r = sin @. 
Thus, x = rsin 6 and dx = rcos@d6. The lower limit is a = —7r/2 (or any 
other point a with sin a = —1?), and the upper limit is 8 = 7/2. Inspection 


of the triangle shows that Vr? — x?/r = cos @, so the integral becomes 


a/2 3/2 1 20 
rf cos? 6d@ = rf Goce", 


—n/2 —1/2 2 
Se (6 aL sin =") eg Ee (4) 
2 2 =P 2 
What about the limits of integration? We have taken a = —71/2 and 
8 = 1/2, but it would appear from Theorem 4.1 that we can take a to be any 
point such that sin a = —1, and @ to be any point such that sin@ = 1. If we 
take, for instance, a = —7/2, 6 = 53/2, then we seem to get 3ar?/2, which is 


not the answer above. 


Where is the trouble? 


INTEGRATION BY PARTS 


The formula for integration by parts is nothing more than an integrated version 
of the formula for the derivative of a product: (fg)’ = f’g + fe’. If this is 


4/ Rienmanligeation 


s 


Example 1 


Example 2 


Example 3 


integrated from a to 6 the result is 


[fede + [Psa ax = fel, (1) 


The formula is useful when the integrand can be written as a product in such a 
way that one factor can be integrated and the other differentiated with a net 
effect that is good. 


Calculate ie sin x dx. 


If x is differentiated and sin x is integrated, the effect is clearly good. 
Therefore, let g(x) = x and f(x) = —cosx. Then 


bo, b b b 
[Pxsinsde = [fe dx = — [fel ax +48, 


b 
=f cos x dx — x cosx 
a 


a 
= sin b — bcos) — sina + acosa. 


Another notation is sometimes easier to remember. With the convention 
that if u is a function of x, then du = u’ dx, formula (1) reads 


Prudt [Podu= wf. (2) 
Calculate {7 log x dx. 


Clearly, it will be helpful to differentiate log x and to integrate 1. There- 
fore, let u = log x and dv = dx. Then du = 1/x dx andv =x. Then 


3 
[oud 
1 


= — [Pode tw 
= —2-+ 3 log 3. 


3 
f log x dx 


3 3 3 
— f, dx + x log x |, 


Calculate [%e! cos ¢ dt. 


This time the trick is to integrate by parts twice. Things do not look so 
good after the first one, but they look much better after the second. 


xz 
: i, e' cos ¢ dt 
0 


zx 
-| e’ sin ¢ dt + eé* sin x 
0 


x 
-| e'costdt + e*cosx — 14 e*sinx, 
0 


THEOREM 
5.1 


Exercise 1 


6 


THEOREM 
6.1 


Proof 


LEMMA 
6.2 


Riemann sums 65 


from which it follows that 


i e*cosx + e*sinx — 1 
e' cos t df = ———________: 
0 2 


The relevant theorem, which results directly from the definition of a primi- 
tive and the formula for the derivative of a product, is as follows: 


Let f and g be continuous on [a, 6] and differentiable at all but a finite number 
of points. Let f’ and g' be bounded and continuous at all but a finite number 
of points. Then 


[re dx + [Pie dx = fg : 


Write out the proof of Theorem 5.1 in full detail. 


RIEMANN SUMS 


If f is integrable, then for any partition p 
$0) < $= [Pf@) dx = 5 < Sy). (1) 


Moreover, for certain partitions p, the sums §$(p) and S(p) approximate the 
integral as well as we please. We shall show now that these sums approximate 
the integral for every sufficiently fine partition. In view of formula (1) it is 
enough to prove the following Theorem. 


Let f be integrable. For every positive number ¢ there is a positive number 6 
such that if p is any partition with |p| < 6, then 


S(p) — S(p) <. 


Let € > 0 be given. Since f is integrable, there is some partition g such 
that S(g) — §(q) < «. Let N be the number of points in g, and let 89 be 
the length of the smallest interval. Let M = sup{f(x):a <x <b}. We 
shall prove the following assertion. 


If p is any partition with |p| < 5 < 60, then 
S(p) — S(p) < «+ 4MN6. 


This will prove the theorem, for M and WN are fixed numbers, and we can 
take 6 as small as we please. 


66 


4/ Riemann integration 


Proof of the Lemma 


Exercise 1 


THEOREM 
6.3 


Let r be the common refinement of p and g. Since r is a refinement of g, 
we have 


SG) 8) & Sq) SG) se (2) 


On the other hand, the sums for 7 and can be compared. The terms in 
S(r) and S(p) are the same with the following exceptions. If y is a point 
of r that is not in f, then the two adjacent points x;_1 and x; must belong 
to p. This is the effect of taking 6 < 49. S(p) contains the term M;(x; — 
x:-1), while in place of it S(r) contains the sum M;(y — x:-1) + Mj/(xi — y), 
where Mand M,’ are the suprema over the two smaller intervals asi 
and [y, xi]. The error committed in replacing the single term by the sum 
of the two is clearly at most 2M(x; — xi-1) < 2M6. This error is com- 
mitted at worst at the N points of g, so the total error is at most 2M N6. 
Thus, we have 
S(p) < S(r) + 2MN6, 

and, similarly, 


S(p) > S(r) — 2MN6. 


Combining these with (2), we get the lemma. 


There are other sums associated with a partition p that lie between §(p) 
and S(p), and hence also approximate the integral. For each 7, choose any 
point & between x; and x; and let 


S(p) =) fl) (x = x0). 3) 


t=1 


[The sums S(p) and §(p) are obtained by choosing £; at the maximum or mini- 
mum of f on [x;-1, x:J—at least if f is continuous so that the maximum and 
minimum exist. ] 


For the sake of logical notation, the definition of partition should be altered so 
as to include the points £;, .. . , & as well as the points x9, 2. . ,x,. State 
the definition correctly. 

It is clear that for every partition p, 


S(p) < Slp) < S(p), (4) 


so Theorem 6.1 has the following consequence. 


Let f be integrable. For every positive number ¢€ there is a positive number 6 
such that if p is any partition with |p| < 6, then 


Is(») — fof) de] <e 


Exercise 2 
(Converse of the 
Theorem) 


DEFINITION 
7.1 


arc length 67 


Let f be any function on [a, 6] and let L be a number. If for every e€ > 0 there 
isa 6 > Osuch that if |p| < 6, then |S(~) — L| < ¢, then f is integrable, and the 
integral is L. 


The statement of Exercise 2 could be taken as the definition of integrability 
and of the integral. Whether it would be a more natural definition than the 
one given depends on the problem one hasin mind at the time. In the problem 
of area, the original definition is the natural one, for the sums §(p) and S(p) are 
the ones that trap the area in between. In the problem of arc length, which is 
discussed in Section 7, these two have no special role, but certain S(p) do. In 
problems of volume (Section 9), $(p) and S(p) do play a role in special cases 
but not in the general case. 

The big advantage of the original definition is that it provides some definite 
numbers § and 5 to work with. The alternative definition gives no clue as to 
what the number L is. 

The sums S(p) are usually called the Riemann sums for the integral. The 
sums 5() and §(p) are called either the upper and lower Riemann sums or the 
upper and lower Darboux sums. 


ARC LENGTH 


There is a nice integral formula for the length of the curve y = f(x),a <x <b. 
The natural way to define the length is to approximate the curve by inscribed 


polygons. 

Choose points a = x9 < %1 << * + * <x, = 4, let y; = f(x), and let A; = 
(xi, i). The polygon P = (Ao, Ai, . . . , An) is the sequence of line segments 
AoAj, AiA2, . . . , Ant4n. Its length is the sum of the lengths of the segments, 
which is 


\(P) = » / Ge Gey (1) 


1=1 


The polygon P is an approximation to the curve y = f(x) as the number 
[P| = max (x: — x:1) B 
goes to 0, so it is natural to make the following definition: 
The length of the curve y = f(x), a < x < |b, is the limit of I(P) as |P| 


approaches 0. This means that the length is L if for each positive number e 
there is a positive number 6 such that |l(P) — L| < € whenever |P| < 6. 


Let us make use of the mean-value theorem in formula (1). There is a 


68 


4/ Riemann integration 


THEOREM 
7.2 


Example 


point £ between x1 and x; such that 


oi — iar = (fy ie or) 


Therefore, 


\(P) 


dy Vi+ PE)? ( — 4), (3) 
t=1 


which is nothing but the sum S(p) for the function V 1 + f’? and the partition p 
composed of the sequences (xo, . . . , xn) and (&, ..., &). Theorem 6.3 
has the following consequence. 


Let f be continuous on [a, b] and differentiable at each interior point. If 
V1s+ f’? is integrable, then the iength of the curve y = f(x) exists and is 
equal to fe V1 + f'? dx. 


The hypothesis that f is continuous on [a, 6] and differentiable at each 
interior point is what is needed to apply the mean-value theorem. 


Find the arc length of the unit circle y = V1 — x? between x = a and x = b. 


In this case f’(x) = —x(1 — x?)—/2, so 1 + f’? = (1 — x?)—!; therefore, 


b dx 
arc length = i = — _Arcsill ) — arcsin a, (4) 
a V1i— eae 


Taking, for instance, a = 0 and } = 1, we find that the length of a quarter- 
circle is 


arcsin 1 — arcsin 0 = 


This is perfectly correct, but it is not justified by Theorem 7.2. The integrand 
is not bounded near 1, so it is not Riemann integrable. 

The problem here lies not with the curve, but with the equation y = f(x) 
that represents it. The same quarter-circle is also represented by the equation 


x = gly) = Vi y?, but the same problem arises. In the first case the deriva- 
tive f’ is unbounded because the tangent line to the circle at (1, 0) is parallel to 
the yaxis. In the second case the derivative g’ is unbounded because the tangent 
line at (0, 1) is parallel to the x axis. The problem arises because the x and y 
axes are forced to play a special role. 

A better way to write the equation of a curve is to express both x and y as 
functions of a third variable ¢: 


x=x(t), vy), a@Stsd. (5) 


Exercise 1 


Exercise 2 


DEFINITION 
eS 


arc length 69 


It is convenient to think of ¢ as the time and of the point (x(2), y(z)) as the position 
of a moving object at time ¢. During the time interval [a, b] the object traces 
out acurve in the plane. The equations (5) are called parametric equations of the 
curve, and ¢ is called the parameter. 


Show that x = cost, y = sint, 0 <¢< 2m, are parametric equations of the 
circle 4° jy =" Ie 


Show that x = acost,y = bsint, 0 <¢ < 2m, are parametric equations of the 
ellipsey (a2 W277 —" 1, 


The examples in the exercises show a second advantage of the parametric 
equations. They describe the full circle and the full ellipse. The equation 
y = f(x) can describe either the top half or the bottom half, but not both at 
once. On the other hand, any curve in the form y = f(x) can be put immedi- 
ately in parametric form by setting 


x=, y = f(t). 
The natural way to approximate a parametric curve 
x=x(), y=yl), ats, 


is to partition the time interval [a, 6]. Ifp = (to, . . . , tn) is such a partition 
and A; = (x(t), y(t), then the sequence of segments 494i, AiAs, yeaa 
is an inscribed polygon whose length is 


up) = V(«(0) = xa)? + W@) — 9Ga))* 6) 


The big square root is just the distance from A;_; to Aj, or the length of the 
segment A;_1Aj. 


The length of the curve x = x(t), y = y(t),a <t < |b, ts the limit of I(p) 
as |p| — 0. Thts means that the length is L if for each positive number e 
there is a positive number 6 such that |l(p) — L| < € whenever |p| < 8. 


Again, there is a nice integral formula for the length, which results from 
using the mean-value theorem in formula (6) to show that /(p) is almost a 
Riemann sum. Choose a point £; between ¢;_; and ¢; so that 


x(t) — x(a) = x’ (Ei) (ti — tr), 
and choose a point 9; between ¢;_; and ¢; so that 


y(ti) — y(tia) = y(n) (ts — tea). 


7o 


4/ Riemann integration 


THEOREM 
7.4 


Exercise 3 


Exercise 4 


Proof of Theorem 7.4 


Substitute these into formula (6) to get 
l(p) = :) V x! (E)? + y(n)? (ts — 1). (7) 
i= 


This is very much like a Riemann sum for the function 
Vi x(t)? + y"(t)?. 


Indeed, it would be a Riemann sum if it were true that ; = &, so it suggests 


the formula 
i Vx (09 G2 ae (8) 


L 


The length of the curve x = x(t), y = y(t),a <t < b, is given by formula 
(8) if the derivatives x' and y’ exist on (a, b) and are uniformly continuous. 


The idea is to compare /(p) as it appears in formula (7) with the Riemann 
sum 


Sip) = ) Vx")? Fy)? = 6). 


To make this comparison you will need the inequality 
Ve = N/E V/s a (9) 
Prove this inequality by considering the triangle with vertices (a, 6), (c, d), and 


(0, 0) and using the geometric theorem that the sum of any two sides of a triangle 
is greater than the third. 


Use inequality (9) to show that 
I'(p) — S(p)| < ) V (x'(4) = x)? + 0") = ¥'G))? @ — ta). (10) 


Let ¢ > 0 be given, and use the uniform continuity of x’ and y’ to find 
6 > 0 such that 
Ix’(t) — x’(s)| << e€ and [y’(t) — y’(s)| < if |t — s| < 6. 


If p is any partition with |p| < 6, then formula (10) gives 


I2(p) a S(p)| < D Ver + 8 (4; — 41) = V 2€(b — a). 


Remark 


THEOREM 
Te 


Exercise 5 


DEFINITION 
8.1 


polar coordinates 71 


This proves the theorem because S(p) differs from the integral in (7) by 
as little as we please when |p| is small. 


The assumption in Theorem 7.4 that x’ and y’ are uniformly continuous on 
(a, b) is certainly stronger than is desirable. It rules out polygons! A more 
reasonable assumption would be that x’ and y’ are uniformly continuous on 
each open interval of a partition of [a, 5]. To handle this type of thing it is 
necessary to make an analysis of arc length along the lines of our analysis of the 
Riemann integral. In particular it should be shown that if ¢ is a point between 
a and J, then the arc length over [a, 6] exists if and only if the arc lengths over 
[a, c] and [c, 6] exist, in which case it is the sum. The crucial feature of arc 
length that makes these things relatively easy to prove is the following theorem. 


The arc length of the curve x = x(t), y = y(t), a <t < |, is the sup of 
L(p) over all partitions p (in the sense that if either exists, so does the other and 
the two are equal). 


We shall not prove this theorem at present or the facts above, but why not 
try it for yourselves? (There are proofs in Section 6 of Chapter 8.) 


Let C be the circular arc described by x = r cos 0, y = r sin 0,a < 6 < b, where 
0<6b—a< 2. Show that the length of C is r(6 — a) and that the area of 
the corresponding circular sector is r?(b — a)|2. Do this both by integration 
and by geometry (similar triangles, etc.). 


POLAR COORDINATES 


Some curves and figures in the plane are described more simply in what are 
called polar coordinates than in the rectangular coordinates that we have been 
using. A point (x, y) # (0,0) in the plane is completely determined by its 
distance from the origin and the counterclockwise angle from the positive x axis 
to the half-line starting at the origin and going through (x,y). These two 
numbers are the polar coordinates of the point (x,y). More precisely, let r be 
the distance from (x, y) to the origin (that is, r = Vx? + y?), and let @ be the 
arc length (counterclockwise) along the unit circle from the point (1, 0) to the 
point (x/r, y/r). By the definition of the sine and cosine we have 


x =7cos@ and y =rsin 0. 


The numbers r and 0 are called polar coordinates of the point (x, y) if 


x =rcosé@ and y =rsin 0. (1) 


WE 


4/ Riemann integration 


Exercise 1 


Exercise 2 


Example 1 


Example 2 


Exercise 3 


Exercise 4 


The argument above shows that each point (x, y) # (0, 0) has exactly one set of 
polar coordinates r and @ such that r > 0 and 0 < 6 < 2z. 


Discuss the polar coordinates of the point (0, 0). 


The pairs (r, 6) and (p, ¢) are polar coordinates of the same point ~ (0, 0) if 
and only if either 


(a) p = rand ¢ = 6 + 2kr for some integer k, or 
(b) p = —r and ¢ = 6+ (2k + 1)z for some integer &. 


Draw the curve whose equation in polar coordinates is r = sin #, 0 < 6 < 2z. 
This is just the circle with center at (0, $) and radius 3. To see this, multiply 
through by r to get r? = r sin @, or x? + y? — y = 0, orx? + (y — 3)? —F¢ = 0. 
As 6 goes trom 0 to 7, the point (7, #) runs around the circle once in the counter- 
clockwise direction. As 6 goes from x to 27, the point runs around again in the 


counterclockwise direction, for in this caser is negative. 


Draw the curve whose equation is r = sin 20,0 < @ < 27. It looks as shown 
in Figure 3. As 6 goes from 0 to 7/2, the point (7, @) runs around loop 1 in 
the counterclockwise direction. As 6 goes from 1/2 to 7, (r, 6) runs counter- 
clockwise around loop 2. Note that r is negative! 


Analyze the examples in detail. 


In Example 2 it appears that the curve has two tangent lines at the origin—the 
x and y axes. Try to discuss this question 


Figure 3 


Exercise 5 


Exercise 6 


polar coordinates a3 
Find equations in rectangular coordinates for the curve in Example 2. 


The reasoning of Section 1 leads to a simple formula for the area of a set 
of the form 
di), Oa <0 = band 0) <7.< (0), (2) 


where f is a given positive function on [a, 6] and 6 — a < 2r. 
Why assume that b — a < 2m? 


As in Section 1, let a = 00 < 01 < -- + < 6, = 6 bea partition of [a, 5], 
let M; and m; be the sup and inf of f on [6;-1, 4], and let 


Re 0 @)e.1 = 6 = Gpands 7 41), 
Ri = {(7, 0):6-1 < 6 < Gand 0 <7 < mi}. 


It is clear that 
Un CAR 
sO 
2 area R; < area A < D area R,. (3) 


In this case the R; and R; are not rectangles but circular sectors. The picture 
looks as shown in Figure 4. 
By Exercise 5 of Section 7, we have 


_ Ma; — a 2(0; — @— 
aréa R; = ENG es), aré€a R; = mio 


(4) 


Figure 4 


74 4/ Riemann integration 


so the sums in inequality (3) are nothing but the lower and upper Riemann 
sums for the function f(@)?/2. This gives the following theorem. 


THEOREM Let f be nonnegative and Riemann integrable on {a, b],0 < b— a < 2rz. 
Cee The area of the set 
A= {(7,0):a <6 < band0 <1 < f(A} 


is the integral 


Exercise 7 Calculate the area of one of the loops in Example 2. 


Any curve with the equationr = /(@),a < @ < 4, in polar coordinates can 
be put immediately in parametric form. Because of the relations x = r cos @, 
y =rsin 0, the parametric equations are 


x = f(@) cos @, y = f(0) sin 6, AAS Se, 
Exercise 8 Write parametric equations for the curve in Example 2 and find its length. 


Exercise 9 As captain of the Coast Guard Cutter U.S.S. Polar Coordinates, you are chasing 
a rumrunner off the foggy coast of San Francisco. The fog lifts for a moment, 
you spot him 4 miles due north, and then the fog comes down. From past 
experience with this fellow you know that he will take off on a straight line at 
full speed, which is 10 mph. The U.S.S. Polar Coordinates will do 30 mph. 
What plan should you follow to catch him? 


Q VOLUME 


There is also a nice integral formula for the volume of a set in three dimensions. 
Let V be such a set. For each xo, let V(xo) be the section of V by the plane 
x = xo; that is, 


V (xo) = {y, 2) (Xony, z) = V}, 


and let A(xo) be the area of V(xo). [{V(xo) is a set in the plane, so this makes 


sense. | 
THEOREM Tf V? is the part of V between the planes x = a and x = b, a < b, then 
9.1 (under mild restrictions on V) 


ole f A(x) dx 


Example 1 Find the volume of a right circular cone with height 4 and base of radius r. 


volume 75 


Put the vertex of the cone at the origin and the axis along the positive x 
axis. The first problem is to calculate A(x). Now, V(x) is a circle, so the 
problem is to find its radius. The way to do this is to look at the section of V 
cut by the (y, z) plane. This is just a triangle, and consideration of similar 
triangles shows that the radius of the circle V(x) is (r/h)x. Therefore, A(x) = 
(ar?/h?)x?, and 


Let us see what can be done about a proof of the theorem. Let p bea 
partition, and consider the Riemann sum 


n 


S(p) = > ACENCR = CRE 


t=1 


We would like to interpret this sum as the volume of a set that approximates 
the given set V. 

For each 2, let R(&) be the cylinder whose base is V(£;) and whose height 
18 4; — #4 (Figure 5)—more precisely, let 


R(&) ms GS 5 Zo; z) S V(E) and Xi-1 ES x = ee 


Figure 5 


76 


4/ Riemann integration 


Exercise 1 


Exercise 2 


Exercise 3 


If it is assumed that the volume of a cylinder is the area of the base times the 
height (which seems fair enough), then A(&)(x; — x:-1) is the volume of R(£;); 
therefore, S(p) is the volume of 


R(p) = U RCE: 


This set “‘approaches” the set V as |p| 0, so S(p) ought to approach the 
volume of V, and the theorem ought to be true. 

The obvious idea is to try to justify the argument by using upper and lower 
sums 5(p) and §(p) as in the case of area. But there is a problem. The set 
inclusion that is needed [corresponding to formulas (1) and (4) of Section 1] is 


Hp GVeG Rp); (1) 


which is simply not true. ‘The fact that one set in the plane has a smaller area 
than another does not at all imply that it is included in the other. (See 
Exercise 2.) The theorem itself is true (subject to a very mild restriction on the 
set V, which is always satisfied in practice), and we shall use it freely; but we 
cannot prove it fully without a sound definition of volume, which will appear in 
Chapter 13. 

There are, however, some interesting cases in which formula (1) is true. 
One is the solid of revolution. The set V is a solid of revolution about the x 
axis if each of the sections V(x) is a circle with center (0,0). It is plain that if 
one such circle has a smaller area than another, then it is contained in the other. 
In this case formula (1) is obvious, and the proof of Theorem 9.1 is complete, 
except for the assumption that the volume of a cylinder is the area of the base 
times the height (which is also elementary, since the cylinders in question are 
right circular cylinders!). One example of the solid of revolution is the cone of 
Example 1. An example that is not a solid of revolution, but where the same 
argument works, is the following. 


Find the volume of the pyramid with height A and base a square of side s. 


Let C be a circle of radius 7 in the plane x = h (arbitrary center), and let V be 
the cone obtained by joining each point of C to the origin. Find the volume of V. 


Let D be the doughnut obtained by revolving the circle x? + (y — 2)? < 1 about 
the x axis. Find the volume of D by looking at D as the difference of two 
solids of revolution. What is the equation of the surface that bounds D? 


Chapter 13 contains a more general theory of integration that puts the 
notion of volume on a perfectly sound footing. 


10 


DEFINITION 
10.1 


Exercise 1 


Exercise 2 


Exercise 3 


improper integrals 77. 


IMPROPER INTEGRALS 


Reconsideration of the arc length of a quarter-circle suggests an extension of 
our notion of integral. We have seen in Section 7 that the length of the curve 


y = V1 — x2 0n0 < x < bis the number 


L(b) = for 0 = 6 < 1. 


iE dx a 
== = arcsin 
0 V1 — x? 


If b approaches 1, then the limit of L(6) is r/2, which is exactly the length of the 
curve on 0 < x <1. 
This suggests that perhaps we should maintain that the integral 


/ , dx 
0V1— x 
exists, even though the integrand is unbounded and is not Riemann integrable 


in the original sense. Such an integral is called an improper integral, and the 
definition is as follows. 


Let f be integrable on [a, b] for each b < c. If the limit 
b 
lim [ f(x) dx 
boc 7% 
b<c 
exists, tt 1s called the improper integral of f from a to c. Similarly, if f is 


integrable on [b, c] for each b > a and the limit 


lim [° f(x) dx 

boa a 

b<a 
exists, it 1s called the improper integral of f from a to c. In both cases the 
integral ts written f 3 f(x) dx as usual. 


The first part of the definition makes sense when ¢c = «, the second part when 
a= —®, Write it out in full in both cases. 


78 


4/ Riemann integration 


DEFINITION 
10.2 


Exercise 4 


Exercise 5 


Exercise 6 


Exercise 7 


Neither of the integrals 


! | 
/ Bl (ee see fy (1) 
~1q)x] ) xe— x 


falls under the hypotheses envisioned in Definition 10.1. The definition allows 
only one trouble spot, and that at one of the end points. The first integral in 
(1) has only one trouble spot, but it is not an end point. The second causes 
trouble at both end points. It is necessary to broaden the definition. 


Let f be defined on [a, b] except at a finite number of points. The improper 
integral of f from a to b exists if there exist pointsa = ay<Xay<++: < 
an = b such that each of the integrals 


JE f) ax (2) 


exists, ether as an ordinary integral or as an improper integral in the sense of 


Definition 10.1. 
It is important to understand that the definition requires each of the 


integrals in (2) to exist. In some cases there is another way of looking at it 
that seems equally reasonable, but is not correct. For instance, it is tempting 


to say that 
1 dx : ~~ as 1 dx : 
— = lim Se —; =lm0=0, 
eal pe a0 =i 4 a * a—0 


whereas, in fact, the integral does not exist. 
Prove that this improper integral does not exist. 
Decide whether the improper integrals in formula (1) exist. 
For what values of p do the improper integrals 
1 ) C) 
i x? dx if x? dx i, x? dx 
0 1 0 
exist? 


Write down the integral for the arc length of the curve y = x sin (1/x),0 <x <1. 
Does the improper integral exist? Does the arc length exist? 


improper integrals 79 


It is not difficult to develop a theory of improper integrals, but neither is it 
very interesting. We shall rely on the examination of individual cases and on 
the reader’s common sense. (As a matter of fact, there are fascinating problems 
about improper integrals, but they are not the kind of problems we have been 
facing so far.) As a rule, when we speak of integrable functions or of integrals 
existing, we shall have in mind the original definitions and not improper integrals, 
unless it is plain from the context that these are included. 


8o 


5 | Taylor's Formula 


i TAYLOR’S FORMULA 


Taylor’s formula deals with the approximation of arbitrary differentiable func- 
tions by polynomials. There are many kinds of approximation in mathematics, 
each with particular advantages designed for particular problems. The one 
studied here is designed to approximate very well in a small neighborhood of a 
given point. It is achieved by matching the function and its derivatives to a 
certain order at the point. Note that the tangent line is exactly this sort of 
thing. It is obtained by matching the function and the derivative at the point. 
The first question is how to manage the additional matching. 


THEOREM Suppose that the derivatives f, f’, . . . , f all exist at the point a. Then 


it the polynomial 


Po) = YG a. 


satisfies Pk(a) = f*(a) for k < mand is the only polynomial of degree <m 
that does. This polynomial ts called the Taylor polynomial of f of degree m 
at a, and is written T7'f. 


The proof of the theorem rests on the following lemma. 


LEMMA if Q(x) = Zi nane —a)*, then ap = QO*(a)/ hk). 
1.2 


Proof (By induction on the number m) We have 


V™®) = » kax(x ~ a), 
k=1 


Taylors formula 81 


so the induction hypothesis gives that 


— (QF Me) _ _Q*a) 


ey = 


This gives the assertion of the lemma if k > 0. The assertion is obvious 
ifk = 0. (Simply set x = a.) 


The serious question is how well the Taylor polynomial does approximate f. 


THEOREM (Taylor’s Formula) Letf,f’, . . . , f+! all exist on an open interval I. 
1.3 If a and x are any two points of I, then there is a point & between them such that 
es ia) m 
FO) = Take) fa i Ce ta)e (1) 


Note that for m = 0, this formula is nothing but the mean-value theorem. 
Therefore, the theorem is true for m = 0, and we are in a position to use induc- 
tion. The induction is based on the mean-value theorem and on the fact that 


CE) aay. (2) 


Exercise 1 Prove formula (2). 


Proof of the Theorem We shall apply the mean-value theorem in the strong form 4.2 of Chapter 3 
to the function 


glx) = f) — Tif) 


and the function A(x) = (x — a)"*!. It says that for any point x there is 
a point y between a and x such that 


fe) — TIF) _ g&) — 8@ _ 89) _ gy) 
=a A(x) — h(a) ') (mm +: 1)y — a) 
Formula (2) shows that 
C= te): 


If we assume as an induction hypothesis that the theorem holds for m — 1, 
then we find that there is point £ between a and y such that 


es) 
m! 


Oa te) = (y — a)". 


The last three formulas give just what is required. 


82 


5/Taylor’s formula 


Example 1 


Example 2 


Taylor’s formula provides the means for interesting calculations. To 
calculate the value of a function f at a point x, we look for a point a at which we 
can calculate the value of f and of all derivatives. Then we can calculate 
Ty f(x), and the idea is to use Taylor’s formula to evaluate the error, or remainder, 
f(x) — T7 f(x). Because of the factor (x — a)™*1, the closer we can take a to x, 
the easier it will be to show that the error is small. 


Calculate e to within 0.01. 


If f(x) = e*, then f*(x) = e* for all k. The one point at which we know 
the value of e? is x = 0, so we write Taylor’s formula with a = 0, which is 


m™m 


xh 
— ay 


ef 


Cn Le (3) 


We do not know what € is, but we do know that it is between 0 and x. In the 
present case we are interested in x = 1, so (Exercise 6, Section 7 of Chapter 2) 


e<el=e < 3, 


Therefore, the error term that cannot be evaluated explicitly is in any event 
less than 3/(m + 1)!, and we have only to choose m so that this is less than 0.01. 
It m = 5, then 


z ee 6 005 
Gea. Pip i et 


Therefore, to within 0.005 we have 


e=14t1 +4444 4345 = 4S = 2.71666... ., 


so 
e= 2.72 to within 0.01. (4) 


Calculate sin 36° to within 0.001. If f(x) = sin x, then 


oe | tcosx if k is odd, 


+ sin x if k is even. 
Since both the sin and the cos are at most 1 in absolute value, we have the 
following evaluation for the error term: 


foe) 


BE OE | |x = alt 


(m+ 1)! ~ (m+ 1)! 


We must choose a point a for which we know both sin a and cos a, and it will be 


Exercise 2 


V) 


equivalent formulas 83 


helpful to choose a as close to 36° as possible. The obvious choice is a = 
30° = 1/6, in which case x — a = 6° = /30. Now, we must choose m so that 


1 ria m+1 
Cam (=) < 0.001. 


This will be true for m = 2. Therefore, to within 0.001, 


meee ogee, | F TLE Net AO) fo N2 
ane -1()+r(2)54 2 (=) 


What value of m is needed in order to do the above calculation with a = 0 
instead of a = 1/6? 


EQUIVALENT FORMULAS 


Taylor’s formula, or at least others equivalent to it, can sometimes be obtained 
by tricks. For instance, let 


S=ityty?t Go se + ym = > ot, 
k=0 
pl =ytyrt--- yt), 
Then Ss —yS — 1 —47"* so 
1 — ymti 1 m+i 
c= = ae a 
LS p 1] i) 
therefore, 
1 m+1 
—- Dt zs fory ~ 1. (1) 
ay aa) 
k=0 


This is a little better than Taylor’s formula, since the error term is quite explicit 
and does not involve an unknown point £. 
Substitution of y = —x? in formula (1) gives 


XG jee = ee @) 


84 5/Taylor’s formula 


and again the same comment applies. Integration of (2) gives a formula for 


the arctan: 
2 dx 
anctaneza— 
Gs 


= =j] ky2k+1 z 2m+2 
= oo + (—1)™*1 ii 7 3 ib. (3) 


k=0 


Exercise 1 Calculate z to within 0.01. See what value of m will be needed if you use the 
fact that /4 = arctan 1—and then think of a better idea. 


Formulas (1), (2), and (3) are not visibly Taylor’s formula, since the error, 
or remainder, terms do not have quite the form envisioned in Taylor’s formula. 
However, they serve the same purpose, and, indeed, the polynomial parts are 
exactly the Taylor polynomials. This is no accident. 


THEOREM If fi f', . . . fm all exist at a, then for every positive number ¢ there is a 
a positive number 6 such that 


f(x) — TH) Sele — alm if la — Ja < 8 (4) 


Moreover, T yf is the only polynomial of degree < m for which this ts true. 


Proof Formula (4) is established by induction. The case m = 1 is nothing more 
than the definition of the derivative. The induction hypothesis and 
formula (2) of Section 1 show thatifg = f — Tf, then |g’(x)| < elx — a|"— 
if |x — a] < 6. Since g(a) = 0, integration from a to x gives the required 
result. 


Exercise 2. If m = 2, the integration is not legitimate, since nothing guarantees that g’ is 
integrable. Complete the argument in this case. 


In order to prove the other half of the theorem, let P be any polynomial 
satisfying (4), let Q = 77 f — P, and use the following lemma. 


LEMMA If a polynomial Q of degree <m satisfies 
2.2 
lim aos = 0, (5) 
Ia (x — aye 
ta 


then Q = 0. 


Proof 


Remark 


Exercise 3 


equivalent formulas 85 


Let 
Qi) = ) ax(x — a)t. (6) 
k=0 
If a; is the first nonvanishing coefficient, then 
Q(x) = (x= a) Y axe = a) = (x — a) RW), 
k=j 


where # is a polynomial with R(a) = a; ~ 0. It is clear that condition 
(5) cannot possibly be satisfied. Therefore, there cannot be a first non- 
vanishing coefficient; so all coefficients are 0, and Q is identically 0. 


Both here and in the proof of Vheorem 1.1 we have used the fact that a poly- 
nomial of degree <m can be put in the form (6). A polynomial of degree 


<m is defined to be a function for which there are numbers ao, . . . , dm such 
that 
mm 
Og = » aye 
k=0 


To show that such a function can be put in the form (6) we have only to write 
x = (x — a) + a, substitute in this formula, and multiply out. 


Show that Theorem 2.1 implies that the polynomials in formulas (1), (2), and 
(3) are the Taylor polynomials. Use this result to calculate f*(0) in each case. 


From Theorem 2.1 we can deduce the formula for the kéth derivative of a 
product of two functions. It is easy to see that 
m™m 
TE fla) Trg) = ) ele — a) + PG) = a), 
k=0 
where P is a polynomial and 


k 
= VIO eo). 
"— LZy i! &- 9! 
Now, f = Ty f+ Rand g = Tyg + S, where R and S have the property that 
for every positive number e there is a positive number 6 such that |{R(x)| < 
elx — al” and |S(x)| < elx — al” if |x — a| <6. Therefore, 


feels) = ) eax — a) + Pl)(x — a" + TIA()S() 
k=0 
+ Trg(2)R(x) + R(x)Stx), 


86 


5/Taylor’s formula 


THEOREM 
2.3 


Exercise 4 


Exercise 5 


Exercise 6 


Exercise 7 


3 


which shows that for every positive number € there is a positive number 6 such 
that 


fee) — ) ae — a)* 


k=0 


<elx — al" ~— for |x — al < 6. 


By Theorem 2.1, this polynomial is the Taylor polynomial of fg, and we have 
the following theorem. 


Prove Theorem 2.3 by induction. 


Start with the identity 
fix) = fla) + [7 at 


and integrate by parts to obtain 


f) = fla) + f'@& — a) + f, OG a aE 


Continue to integrate by parts to obtain 
1 z 
f(x) = Trf() + 4) fea (x — 2)” dt. (7) 


This is called Taylor’s formula with the remainder in integral form. State 
carefully the hypotheses that are needed to carry through the proof. 


Suppose that f and p are continuous on [a, 6] and that p > 0. Show that there 
is a point & between a and 5 such that 


[FO at = HD f? plo at 


This is called the mean-value theorem for integrals. 


Use formula (7) and the mean-value theorem for integrals to derive the original 
form of Taylor’s formula—but under slightly worse hypotheses. 


LOCAL MAXIMA AND MINIMA 


If a function f has a maximum or minimum at a point a, and if f’(a) exists, then 
f(a) = 0. On the other hand, f’(a2) may well be 0 without there being a 


local maxima and minima 87 


maximum Or minimum at a. What we want now is a supplementary condition 
to add to the condition f’(a) = 0 that will guarantee a maximum or minimum. 
The supplementary condition will bear on the higher derivatives of f at a. 
Since these depend only on the behavior of f near a, the notions of maximum 
and minimum must be modified. 


DEFINITION A function f on a set S has a local minimum at the point a © S if there is a 
Sell positive number 6 such that 


Hh) Be) if x © Sand |x — al < 6. 


THEOREM Suppose that f’, . . . ,f™tare all 0 ata and that f™ exists but is not 0 at a. 
3.2 (a) If m is odd, then f has neither a local maximum nor a local 
minimum at a. 
(b) If m is even, then f has one or the other—a minimum if f(a) > 0, 
and a maximum if f™(a) < 0. 


Example 1 f(x) = x* has neither a local maximum nor a local minimum at x = 0, because 
the first nonvanishing derivative is the third, which is odd. 


Example 2 f(x) = x* has a local minimum at x = 0 because the first nonvanishing deriva- 
tive is the fourth, which is even, and its value is 24, which is positive. 


Proof of the Theorem Write out the inequality in Theorem 2.1 with 
__ LIA) 
2 m! 
Since: ae = arc all Oat @, it reads 
f(a) 1 |fr(a)| 
2 at ae (ae 2) | |x — al” 
m! Le i! 


if |x — al < 6. (1) 


This implies that f(x) — f(a) has the same sign as [f™(a)/m!](x — a)™. 
If m is odd, the sign changes as x crosses from one side of a to the other, so 
there can be neither a maximum nor a minimum. If m is even, the sign 
remains the same, so there must be one or the other. For example, if 
f(a) > 0, then f(x) — f(a) is positive for all x near a; hence f(x) > f(a) 
for all x near a, and f has a strict local minimum. 


Theorem 3.2 does cover most cases, but not all. It may happen, for 
instance, that f’, . . . , f”—1 are all 0 at a, while f” does not exist. In this case 
anything can happen. 


' Exercise 1 Give examples. 


t 


88 


5/Taylor’s formula 


It may also happen that f*(a2) = 0 for every &. Consider the function 
es) SI yeaa = OC, fo) = ©. 


This function has derivatives of every order at every point, including 0, and all 


derivatives vanish at 0. To establish these properties, carry out the following 
exercises. 


Exercise 2. For each positive integer k, 
= fo)", Pa polynomial 


Exercise 3. For each positive integer m, 


linn atest = — 10) 
x0 
2¥0 


Exercise 2 is handled immediately by induction. Exercise 3 reduces to 


7 m—y? — 
lim y™ev" = 0, 
yr oe 


which can be handled by I’Hospital’s rule. 


The function f obviously has a minimum at 0. The function —/f has the 
same properties and has a maximum. The function 


f(x) its 2 0 


a(x) = fx) Safa = 0. 


also has the same properties and has neither. 


6 : Sequences and Series 


] SEQUENCES AND SERIES 


DEFINITION A sequence in a set S is a function from the positive integers into S. If sts a 
1.1 sequence, tt is customary to write sn for s(n) and {sn} for s. 


Eventually we shall discuss sequences in a variety of sets S, but for the 
present the important one is the set of real numbers. In this case the limit at 
oo is defined as it was in Section 5 of Chapter 3. 


DEFINITION Tf {sn} is a sequence of real numbers, then limy. © 5n = l, or Sn — J, of for 
1.2 every positive number e there ts a positive integer no such that if n > no, then 
lsn — l]| <e. If the limit exists, the sequence is said to converge. If it does 

not, the sequence ts satd to diverge. 


Example 1 The sequence {(1 + (1/n))*} is strictly increasing and has the limit e. 


To say that {s,} is strictly increasing means that sn41 > sn for every n. To 
say it is increasing mcans that 5,41 > sn for every n. The assertion of the 
example follows from Example 4 at the end of Section 5 of Chapter 3. 

Before taking up other examples, lct us prove an inequality that gives a 
pretty good idea how fast n! increases. 


LEMMA ni > (aja) for mn = 1, 
1.3 


The lemma is proved by induction. Assume that it holds for n, and 
multiply both sides by n + 1 to get 


Gis (“) @ = 1). 


89 


90 


6/sequences and series 


Therefore, it is sufficient to show that 


(Jonna (ey. euyets 
(8 € (8 (8 


or, in other words, that 


This follows from Example 1. 


Example 2 For each real x, x"/n!— 0. 
This follows from the lemma, for 


<|2| ifn > |ex|, 
n 
and the term on the right is certainly as small as we please when n is large. 


Example 3 For each positive integer n and each real x, let 


n 


ce) = — 


For each real x, we have s,(x) — e. 


It is not surprising that what is involved here is Taylor’s formula, for 5, is 
the Taylor polynomial for e?. Taylor’s formula gives 
Eyntl 
a — 5, (x) = an with |é| < [x]. 
The result follows from Example 2. 
This example suggests that it should be profitable to define a notion of 
“infinite sum.” 


DEFINITION Let {ax} be a sequence of real numbers. The sequence {sp} defined by 


1.4 
nr 
k=0 


1s called the series associated with the sequence {ax}. The limit of {s,} is 
called the sum of the series and is written Ly 9 ax, provided, of course, the 
limit exists. If the limit does exist, the series converges; if not, it diverges. 


sequences and serves or 


In terms of this definition the result of Example 3 is that 


xk 
a ) fi for all real x. (1) 
k=0 
Establish the following formulas: 
iG es for all real 2) 
sin x = YOR i Gk +1)! or all real x. ( 
xt 
= = for all real x. 3 
COs x Se (—1) (Ok)! or all real x (3) 
1 k 
a x for |x| <1 (4) 
k=0 
log(1 — x) = — > yi for | <olk (5) 
k=1 
okt 
t = Se fe il 
arctan x > (—1) kod or |x| < (6) 


These series are called Taylor series. ‘The proofs of the formulas call for 
an evaluation of the error, or remainder, term in Taylor’s formula—or one of 
its equivalents from Section 2 of the last chapter. 

In general, if f is a function which has derivatives of all orders at a point a, 
then the Taylor series of f at the point a is the series 


v1 f(a) 


a (x — a)*. 


k=0 
There are a couple of points to be wary of here. First, the series may not 
converge for any value of x (except x = a). Second, even if it does converge, 
the sum may not be f(x). For the time being at least, the way to show that the 


series does converge and that its sum is f(x) is to estimate the remainder in 
Taylor’s formula. One nasty example is the function 


ie) = ee fora 057 (0) = 


considered at the end of the last section. It was shown in the exercises there 


92 


6/sequences and series 


that f*(0) = 0 for every k. Therefore, the Taylor series at 0 is the series of 0’s 
for every x. This certainly converges for every x and has the sum 0—which is 
not at all equal to f(x). 

Some ambiguity in the notations may have been noticed. According to 
Definition 1.1, a sequence {a,} should start with the term ai, while in Definition 
1.4, and in all the Taylor series, it starts with ao. Obviously, the starting point 
does not matter. 

The same symbol 2; az has been used both for the sum of the series and 
for the series itself. According to the definition, it should stand for the sum, 
and some other symbol, such as Za;, should be used for the series itself. We 
shall make use of this convention but not very systematically. The ambiguity 
does exist throughout the mathematical literature and must be borne. 

The number a, is called the nth term of the sequence {a,}. It is also called 
the nth term of the series 2a,, which is another ambiguity. It ought to be the 


number 
n 
in = : ak, 
k=0 


which is called the mth term of the series. The latter is called the nth partial 
sum. 

From the definition of a series as a sequence of partial sums, it is plain that 
any theorem about convergence of sequences leads to a corresponding theorem 
about convergence of series. On the other hand, theorems about series also 
lead to theorems about sequences, for if s, is the mth partial sum of the series 
Dax, then dn = 5, — Sn—1; So the sequence {a,} can be recovered from the series 
2a,. A second way to recover a given sequence {a,} from a series, although 
not the corresponding series of partial sums, is the formula 


n 


An = a, + > Grane 
k=2 


Thus, the study of sequences is more or less equivalent to the study of series. 
Often, however, a theorem that is interesting and natural for one of the two 
becomes very awkward when it is rephrased for the other. 


INCREASING SEQUENCES AND POSITIVE SERIES 


With nothing but the definition at hand, there would be serious difficulties 
standing in the way of a satisfactory theory of convergence. To decide whether 
a given sequence converges, one must know ahead of time the limit to which it 
converges! For any given number / Definition 1.2 gives the test to determine 


increasing sequences and positive series 93 


whether lim,_,. 5% = /—which is useless without knowing what number / to 
test. (Test them all?) 
Consider, for example, thc series 


i) 


Lea : 
k=0 

Taylor’s formula shows that the series does converge and that the sum is e. 

Looking just at the series, however, and not at Taylor’s formula, there would 

be no way to guess that the sum is¢ and no way to prove that the series converges. 


For the series 


1 
B (2) 
k=1 
there is no way to guess the sum and no way to prove that the series converges. 


What are needed are criteria of convergence that bear only on the sequence 
or series itself. 


THEOREM Every convergent sequence 1s bounded. 
eel 
THEOREM Every bounded increasing sequence converges to tts least upper bound. 
2.2 . 
Proofs Suppose that lim,.. 5n = /. ‘Take any positive e, say ¢ = 1, and find a 


corresponding integer ny such that 
in ee hence |sn| < [é{ + 1 ifn > no. 


Then 47 = niax(\sil, (el. . 5 Sel [1 4 1) isa bound. 

Suppose that {s,} is increasing (that is, 5241 > s,) and bounded, and 
let / be the least upper bound. Let be a given positive number. Then 
/ — « is no longer an upper bound, so there exists no with 5,, > / — e. 
If n > no, then 


(2 Sp 22 bp, Sok ee 


These two theorems have a wide range of application, coming partly from 
the fact that a series with nonnegative terms always has increasing partial sums. 


Exercise 1 Show that 27, 1/k diverges (that is, does not converge) by showing that 
a 1 od. 
= 5(p+) 2 Noe, 
x 1 « 


where # is the partition (1, 2, . . . , 2) of the interval [1,7]. Draw a picture. 


94 


6/sequences and series 


Exercise 2 


Exercise 3 
Exercise 4 


Exercise 5 


Exercise 6 


3 


DEFINITION 
3.1 


Show that 27, 1/k? converges by showing that 


1 a 1 
n= 8(% 4) < yo l== 
1 


i 
For what values of p does the series 2, k-? converge? 
What about the series 22 1/k log k? 


If 0 <a < b; holds for all large k, and Df, 6, converges, then D2, ax 
converges. 


The result of the last exercise is called the comparison test. Probably it is 
the most valuable convergence test there is—but plainly the value depends on 
having a large stock of series to use in comparison. 


Show by the comparison test that Zj)1/k! and ZZ, 2-* converge. 


CAUCHY SEQUENCES 


Theorems 2.1 and 2.2 tell the whole story for increasing sequences and for series 
with positive terms. Together they give a simple, necessary, and sufficient 
condition for convergence. Obviously, they also do the job for decreasing 
sequences and series with negative terms. But they do nothing whatever for 
general sequences and series. 

The problem is to get one’s fingers on the limit. In the case of an increasing 
sequence this is accomplished easily by taking the least upper bound. In the 
general case it is more complicated. 


Tf {sn} is a bounded sequence, let 
bn = sup{s,:k > n} and b = inf{d,}. 


The number b is called the limit superior of the sequence {s,} and is written 
lia sup sa, 


The point is that every bounded sequence has a limit superior, and if the 
sequence happens to have a limit, then it must be the limit superior. This 
notion appeared already in formula (4), Section 2 of Chapter 3, at which point 
the following lemma was proved. 


Cauchy sequences 95 


LEMMA Let b = lim sup sy. For every positive number 6 there is a positive integer k 
3.2 such that 


1 
k>= and |b | <8. 


The main theorem on the convergence of general sequences is as follows: 


THEOREM The sequence {sn} converges if and only if it has the following property: 
3.3 For every positive number ¢€ there is a positive integer no such that if n > no 
and m > no, then |Sn — Sm| < €. 


A sequence with this property is called a Cauchy sequence (after a Frenchman 
named Cauchy). Two easy parts of the theorem are left as exercises. 


Exercise 1 Every convergent sequence is Cauchy. 


Exercise 2. Every Gauchy sequence is bounded. 


Proof of the Theorem What remains to be proved is that every Cauchy sequence converges to 
its limit superior. Let 6 = limsups,, and let e€ > 0 be given. First 
choose no so that if n > no and m > no, then les — Smal <e. Take any 
positive number 6 smaller than 1/no and smaller than e, and choose k as 
in the lemma. If n > mo, then we have 


|b —s,| < |b — sel + |x — 5, <e-+ ¢€ = 2e. 
Indeed, |b — s,| < 5 < €, while |s, — s,| < € follows from the fact that 
k > no. 


The theorem looks harmless enough, possibly not even interesting. This 
is deceptive. For instance, it allows the results on series with positive terms to 
be brought to bear on general series. 


THEOREM If Dp. |ax| converges, then so does Diy ap. 
3.4 
Proof Let s, be the nth partial sum of the first series and ¢, be the nth partial 
sum of the second. For n > m we have 
nr n 
[t2 — tml = | > dels > alo = ee 
k=m+1 k=m+1 


from which it is plain that if {s,} is Cauchy, then so is {tn}. 


96 6/sequences and series 


THEOREM 
S58) 


Proof 


Exercise 3 


Exercise 4 


Exercise 5 


Exercise 6 


A series such that 27", |az| converges is said to converge absolutely. It is 
pretty hard to prove that a series converges without proving that it converges 
absolutely, but there are examples. One, according to the following theorem 
and Exercise 1 of the last section, is the series 2, (—1)*(1/&). 


If {ax} is decreasing and limy,-,. ax = 0, then De, (—1)*ay converges. 


What can be shown is that if n > m, then 
es nce ee (1) 
which proves the theorem because of the assumption that am — 0. 
To see that formula (1) holds, write 
(=1)"*15, — Sm) = Gmai — @mia + amis — Gmpa + * * * Gy. 


Group the terms in pairs, the first two together, the next two together, and 
so on. Since the sequence is decreasing, each pair is nonnegative, and if 
there is a term left over that cannot be paired, it too is nonnegative. 


Therefore, 
(—1)™"4+1(59 — 5m) > 0. 


Now group in pairs, but leave out the first term an41. This time 
each pair is nonpositive, and if there is a term left over, it, too, is nonposi- 
tive. Therefore, 


(Ses, or oS) as aml: 
The two inequalities together give (1). 
Show that the improper integral ie (sin x/x) dx exists. [Hinit: Let a, = 
[fitP* (sin x/x) dx| and use Theorem 3.5.] 


Define the limit inferior (lim inf) of a bounded sequence, and prove Lemma 3.2 
for the lim inf. 


If 2a, converges, then a, — 0. 


If La, and 2b, converge, or converge absolutely, then so do Z(a, + 6,) and 
Zoa, (a real). 


In many respects infinite sums behave much like finite sums, but in some 
respects you have to watch out. Ina finite sum, for example, the order of the 
terms is immaterial—3 + 2 = 2+ 3. In an infinite sum this is not always 
the case. To state the theorem, the following definition is needed. 


DEFINITION 
3.6 


THEOREM 
3.7 


THEOREM 
3.8 


Proof of Theorem 3.7 


Exercise 7 


Exercise 8 


Cauchy sequences 97 


A permutation of a set X ts a one-to-one function from X onto itself. 


If Za, converges absolutely, then for every permutation v of the positive 
integers the series La, x) converges absolutely to the same sum. 


So far so good, but see how strange things are for series that do not converge 
absolutely. 


If the series Za, converges, but does not converge absolutely, then for any 
number s whatever there is a permutation v of the positive integers such that 
Darn) = S. 


Let s = Da, and let e > 0 be given. Choose mp so that 


n 


iD lax | <e ifn > no. (2) 


k=no 
Show that (2) implies that |s — 27°, a:| < e. 
Now let » be any permutation, and let 
n= aes (DI sg os OP aa 


If m > mo, then we have 


m no 
Js — ) arm <|s- ) a 
k=1 k=1 


where n = max{y(1), ..., v(m)}. Formula (2) and Exercise 7 show 
that the right side is at most 2e and prove the theorem. 


n 


a y laxl, 


=no 


Theorem 3.8 is not as niysterious as it looks. Develop a proof along the follow- 
ing lines: Let {5,} be the sequence of nonnegative terms in the sequence {a;} 
and let {c;,} be the sequence of negative terms (both picked out in order). Show 
that both series 25, and Zc; diverge. Now pick out 6’s and c’s according to the 
following plan: Pick just enough 6’s to get a sum >s, then just enough c’s to get 
a total sum <s, then just enough 6’s to get a total sum >s, and soon. Use 
Exercise 5 to show that the resulting series converges to s. 


In view of Theorem 3.7, it is natural to wonder whether the definition of 
absolute convergence cannot be made in a way that is independent of the order 
of the terms. Consider first the case where each a; > 0. For each finite set F 
of positive integers, let sr be the sum of the a;’s with & € F, and let s = sup sr, 
the upper bound being taken over all finite sets F. 


98 


6/sequences and series 


Exercise 9 


Exercise 10 


4 


Example 1 


Exercise 1 


Example 2 


s = La, if each a, > 0 (in the sense that the equality holds if either is finite). 


Formulate a condition on the s,’s that is equivalent to absolute convergence 
(a, not necessarily >0, of course). (In proving that your condition works, you 
will have to use Theorem 3.8.) 


SEQUENCES OF FUNCTIONS 


In the case of sequences and series of functions a new problem comes up: to 
deduce properties of the limit from known properties of the individual terms. 
For instance, if each term is continuous, is the limit continuous? If each term 
is differentiable, is the limit differentiable and can we differentiate term by term? 
If each term is integrable, is the limit integrable and can we integrate term by 
term? Consider the Taylor series for the sine: 


sinx = ye Damen n 


If we differentiate term by term we obtain the series 


» - a ’ 


k=1 


which we recognize as the series for the cosine. In this case, differentiation 
term by term is all right. 


Let fn(x) = (1 — x*)" for —1 <x <1. For each x # 0, we have fa(x) > 0, 
and for x = 0 we have f,(0) = 1 foralln. Thus {f,(x)} converges for each x, 
—1 <x < 1, and each term is perfectly differentiable; but the limit is not even 
continuous at 0. 


Show that integration term by term is all right in Example 1. 


Define f, on [0, 1] so that on (0, 1/n] the graph of f, is the isosceles triangle with 
height 2n, and so that f, = 0 on [1/n, 1] (Figure 1). 


It.is plain that f,(x) — 0 for every x, while 


fy H@) Paw 


Exercise 2 


DEFINITION 


4.1 


Exercise 3 


Exercise 4 


Exercise 5 


Exercise 6 


sequences of functions 99 


l/n 1 


Figure 1 


This is a case where we cannot integrate term by term. 
Write the equation for f, and verify the assertions above. 


The examples show that pointwise convergence of a sequence of functions 
(that is, convergence at each point) is insufficient to imply very much about the 
limit function, and certainly is insufficient to allow either integration or dif- 
ferentiation term by term. What is needed is a stronger kind of convergence. 


The sequence {f,} of real-valued functions on the set I converges uniformly to 
the function f if for each positive number € there is a positive integer no such that 


f(x) —fnlx)| <e€ for all x EC ITandall n> no. 


Explain the distinction between pointwise convergence and uniform convergence. 
Compare it with the distinction between continuity and uniform continuity. 


Show that the sequences in Examples 1 and 2 do not converge uniformly. In 
both cases, however, show that the convergence is uniform on any closed sub- 
interval that does not contain 0. 


Define the notions of pointwise Cauchy and uniformly Cauchy sequences of 
functions. 


Every uniformly convergent sequence is pointwise convergent, and every uni- 
formly Cauchy sequence is pointwise Cauchy. 


loo 


6/ sequences and series 


Exercise 7 If J is a finite set, then every pointwise convergent (or Cauchy) sequence is 
uniformly convergent (or Cauchy). 


Exercise 8 Define uniform convergence of a series of functions. Show that the Taylor 
series for sin x converges uniformly on each interval [—7, 7], but not on the whole 
real line. 


THEOREM 
a2 


Proof 


THEOREM 
Shot! 


Proof 


Every pointwise Cauchy sequence converges pointwise. Every uniformly Cauchy 
sequence converges uniformly. 


If {f,} is pointwise Cauchy, then for each x € J the sequence { f,(x)} isa 
Cauchy sequence of real numbers. Let f(x) be the limit, which exists by 
Theorem 3.3. Then isa real-valued function on J, and f, — f pointwise. 

Let { f,} be a uniformly Cauchy sequence, and let f be the pointwise 
limit, which exists because of Exercise 3 and the first part of the theorem. 
It will be shown that f, — f uniformly. Let e > 0 be given and choose 
no SO that ifn > no and m > no, then 


lfn(x) — fm(x)| < for all x € J. 


This is possible because { f,} is uniformly Cauchy. The no so determined 
is the one we are looking for. Indeed, let x be any point of J. Since 
fn(x) — f(x), we can choose m > no so that 


[fm(x) — flx)| < 


Then, if n > no, we have 


lfa(x) we f(x)| = 


a) fala ne) — fix) < Ze 


Let I be a set of real numbers. If f,— f untformly and eachf, is continuous, 
then f 1s continuous. 


Let e > Oand a € Tbe given. First choose np so that if n > no, then 
lfn(x) — flx)| <e for allx EI. (1) 
Take any fixed n > mo, and use the fact that f, is continuous at a to find 
5 > 0 so that 
lfn(x) — fala)| <e if |x — al < 6. (2) 
If |x — a| < 6, then we have 


 |flx) — f(@)| S fl) — fal®)| + [fale) — fala)| + fala) — fl. 


Each of the three terms is less than e—the first and last because of (1) and 
the second because of (2). 


Exercise 9 


THEOREM 
4.4 


Proof 


sequences of functions Iol 
Show that if each f, is uniformly continuous, then f is uniformly continuous. 


Example 1 shows that a pointwise limit of continuous functions is not 
necessarily continuous. 


Let I = [a, 6]. If fa—f uniformly and each fp, is integrable, then f vs 
integrable, and 


[fa pees [Pre De 


Let « > 0 be given and choose mo so that if n > no, 
tie) =e =f) sf.) be fora & % =U. (3) 


If f is known to be integrable (for instance, if each f, is continuous, then 
Theorem 4.3 shows that f is integrable), then we can simply integrate 
inequality (3) to get 


J. fale) de — 6 — 0) < [fle dx < [Pfals) de + ba) 4) 


for n > no, which proves the theorem. 
Since f is not known to be integrable, we have to go jack to partitions. 
Inequality (3) gives 


Si; f) < Sei fa + ©) = Sp; fa) + €(b— a) 
for every partition p; hence 
SG) SGa) ne — a) 
With a similar inequality for §, we have 
SGn) — «(6 — a) < Sf) < SY) < SUfa) + € — 2). 


Since f, is integrable, we get 5(f) — S(f) < 2e(b — a), which shows that f 
is integrable. (So the initial part of the proof applies.) 


Example 2 shows that the theorem is false with pointwise convergence 
instead of uniform convergence. (However, we shall prove a much better 
theorem in Chapter 13, which shows that pointwise convergence is almost good 
enough.) Example 1 shows that the corresponding theorem on differentiation 
isfalse. In this case much stronger hypotheses are needed—effectively uniform 
convergence of the derivatives. 


102 6/sequences and series 


THEOREM 
4.5 


Proof 


Exercise 10 


Exercise 11 


Exercise 12 


Let I be an open interval, and assume that 
(a) Each f, ts differentiable at every point of I and f., is continuous. 
(b) {f,} converges uniformly on each closed subinterval. 
(c) For some a € I, {f,(a)} converges. 
Then {fn} converges uniformly on each closed subinterval to a limit f, 
and f,, > f' (uniformly on each closed subinterval). 


Let f, > g and let f,(2) > 6. By Theorem 4.4, 


[J ee) de = Yim ff.) at = him (fas) = fal@)) 
lim fa(x) — b. 


n> 7 


This shows that f(x) = lim f,(x) exists for every x and that 
fs) = b+ f* gle at 


It follows from the fundamental theorem of calculus that f’(x) = g(x) at 


every point where g is continuous—i.e., at every point, by virtue of 
Theorem 4.3. 


The proof used the fact that ifs, s and t,—> ¢, thens, +t, 25+4. Prove 
this and point out where it was used. 


Why does f,— f uniformly on each closed subinterval? 


Instead of (c), assume that f,(x) — f(x) for every x © 7. Now prove the 
theorem without using integration. [Hint: By the mean-value theorem we have 


falx) — fala) 


25 = (8 


= frlén) = g(a) + (En) — g(a) + fi(En) — g(En); 


hence 


frlx) — fala) 


x—a 


— g(a) | S lg(En) — g(@)| + [fi(ée) — g(&)|- 


The second term is small by the uniform convergence and the first term is small 
by the continuity of g.] 


The above results apply equally well to series. The usual way to show 
that a series of functions converges uniformly is to find a sequence of numbers 
M,, such that 


[fn(x)| <M, for all x, and 2M, converges. (5) 


THEOREM 
4.6 


Exercise 13 


Exercise 14 


THEOREM 
5.1 


power series 103 


(Weierstrass M Test) If (5) holds for n > no, then the series Ufn 


converges uniformly and absolutely. 


Prove the theorem. (Hint: Use Theorem 4.2 and look back at the proof of 
Theorem 3.4.) 


As an introduction to Section 5, show that each of the Taylor series in formulas 
(1) through (6) of Section 1 converges uniformly on any closed subinterval of 
the interval indicated and that the differentiated series do the same. 


POWER SERIES 


The results of Section 4 are especially pretty for the special series called power 
series. A power series with center a is a series 2a;,(x — a)*. The Taylor series 
of a function is always a power series—and, as a matter of fact, every power 
series is the Taylor series of a function, but this will be proved only when the 
series converges for at least one point x # a. The principal theorem is as 
follows. 


Lat fGy — Zi ene — at Then 

(a) There ts an r > O (possibly ©) such that the series converges for 
|x — al <r and diverges for |x — al > r. 

(b) The convergence is absolute and uniform on each closed subinterval 
of |x — al <r. 

(c) The series can be differentiated term by term; that is, 


fi) = » kay(x — a)F} jor |x& — a) = 7, 
al 
(and this series also diverges for |x — a| > r). 


It should be emphasized that the series is what is given, and the function f 
is defined to be the sum of the series at whatever points x the sum exists. The 
theorem then asserts that this set of points must be an interval with center a. 
The number 7 is called the radius of convergence of the series. It may be 0, in 
which case the series converges for no point x # a. 

According to part (c), the derivative of f exists on |x — a| <7 and is 
obtained by differentiating term by term just as if the sum were finite. More- 
over, the radius of convergence of the differentiated series is precisely the same 
number r. Since the differentiated series is again a power series, the theorem 
can be applied to conclude that f’’ exists on the same interval |x — a| < 7 and 


104 6/sequences and sertes 


is obtained by differentiating term by term, and again to conclude that f? exists, 


and so on. 
COROLLARY Tf f(x) = Dyan ax(x — a)* on |x — al < 1,7 # 0, thenf has derivatives 
5.2 of all orders on |x — al <-r, and 
_ fFla) 
i, = —°? 
k! 


This corollary shows that the coefficients in a power series are uniquely 
determined by the sum f—the series has to be the Taylor series of f. To prove 
the last part, use part (c) of the theorem to differentiate m times and get 

n(x) = D a hie (heen oe ieee eae. 
k=m 
Now put x = a. The only nonzero term is the one with k = m, and the formula 
gives f"(a) = m!an. 

Before taking up the general theorem, we shall discuss the particular 
series Zy*. The general theorem can be reduced to this particular case by the 
theorems of the last section. In Section 2 of Chapter 5 the following formula 
was established: 


n 


1 yee 
aaa pees 1 
a aa (1 
k=1 
Differentiation gives 
1 a a 
—— = > ky® + ee (2) 
(l—y)P  £ oe) 
LEMMA The series Xy* converges if |y| <1 and diverges if |y| > 1. The series 
5.3 LDky* converges if |y| <1 and diverges if |y| > 1. 


The first part of the lemma is plain from formula (1). The second part 
follows from formula (2), but is not so plain. It depends on the fact that 


lim ny” = 0 if ly| < 1. (3) 


na 0 


Exercise 1 Formula (3) was already established in Exercise 4, Section 5 of Chapter 3. If 
you do not remember it, do it again with |’Hospital’s rule. Use this to complete 
the proof of the lemma. 


power series 105 


It is quite surprising that the general Theorem 5.1 reduces to this special 
case, and also that there is a simple formula for the number r. 


THEOREM The radius of convergence of the series Za,(x — a)* is the number r defined by 
5.4 


1 
— = lim sup(|az|)"*, (4) 
7 


and r = O if {(|a,|)1/*} is unbounded. 


Proof of Theorems Suppose that the series converges for some x # a. According to Exercise 
5.1 and 5.4 5, Section 3, ax(x — a)*— 0; so, in particular, there is a positive integer 
ko such that if k > ko, then |ax(x — a)*| <1. Therefore, 
1 : 
(laz|)* < ifk > ko, 
|x — a| 


which implies that 


1 
— = lim sup({ax|)1/* < 
e 


jx — al 


and hence that |x — a| <7. This proves half of part (a) of Theorem 5.1 

To prove part (b) and with it the other half of (a), let r’ be any 
positive number smaller than r, and let r’’ be any number between them. 
Since 1/r’’ > 1/r, it follows from the definition of r that there is a positive 
integer ko such that 


1 
(ax|)1/* < Fi if k > ko. 
- 


Therefore, 
1\é 
el < (5) ifk > ko. 
Consequently, 
NF 
— a)tl < {—) = yé 
lane — a)! S (5) tif k > ko and |x — al <1’. (5) 


[kan(x — a)*| < ky* 


The lemma and Theorem 4.6 show that both series Za,(x — a)* and 
Dka,(x — a)* converge absolutely and uniformly on |x — a| <7’, Then 
Theorem 4.5 shows that the term-by-term differentiation in (c) is all right. 
The only thing that remains is to show that the differentiated series cannot 


106 6/sequences and series 


Exercise 2 


THEOREM 
5.5 


Example 


Exercise 3 


Proof of the Theorem 


actually have a larger radius of convergence. If it did we could use 
Theorem 4.4 to integrate term by term and get a larger radius of conver- 
gence for the original. 


Write out the proof of this last statement. [You will have to use part (b) of 
the present theorem!] 


Usually the easiest way to find the radius of convergence of a power series 
is to use the following theorem (called the ratio test): 


(Ratio Test) If b, > 0, then the series Xb, converges if 


b 
lim sup | 
k 


and diverges if 


b 
lim inf “2 > 4, 
by, 


The Taylor series for ¢? is the series 


xk 


a 
With x fixed, take 6, = |x*/k!|. Then 
lim ve = lim elle = 
by eae 
Therefore, the series converges absolutely for each x. 


Apply the theorem to find the radius of convergence of the several Taylor series 
in Section 1. 


If 
b 
lim sup —* < 1, 
bi, 
then there is a number y < 1 such that 
b 
lim sup =< ds 
by 


and there is a positive integer ko such that 


bryt < ybi if & > ko. 


analytic functions 107 


Hence 
Digta < yDigs Digg2 < yOkgty < y7Dig, Digts < Were < y%digg - - - - 


In general, 


Bytes a for every p. 


The series converges by comparison with Zy?, y < 1. 


Exercise 4 Prove the other half of the theorem by using a similar argument to show that 
the sequence {b,} is unbounded (hence does not approach 0). 


The theorem is effective on power series because of the cancellation between 
the powers of x — ain one term and the next. Usually it is not much good on 
other series. ‘Too often the limits in question turn out to be 1. 

Note that the theorem can show that a Taylor series converges, but it 
cannot show that the Taylor series converges to the function. This is the topic 
of the next section. 


6 ANALYTIC FUNCTIONS 


The point of departure in the last section was the series. We established various 
criteria of convergence and properties of the sum. The point of departure here 
is the function. We begin with a function and ask when the Taylor series not 
only converges, but converges to the function. Because of Corollary 5.2, we 
consider only functions that possess derivatives of all orders on some interval 
with center at the point in question. 


DEFINITION A function f is of class C™ at a point aif there is a positive number r such that 
gol f is defined and possesses derivatives of all orders on the interval |x — al <r. 

DEFINITION A function f is analytic at a point a if there exist a positive number 1 and a 
6.2 power series Za,(x — a)* such that 


L-) 


he) = d axle — a)* for |x — al <r. (1) 


It is implicit, of course, that f is defined on |x — a| <r and that the series 
converges on |x — a| <r. It follows from Corollary 5.2 that if f is analytic at a, 
then f is C®, and the series in (1) must be the Taylor series. The function 


eee 0) — 0, 


108 


6/sequences and series 


THEOREM 
6.3 


Proof 


is one which is C® at every point, but is not analytic at 0. Its Taylor series at 0 
is identically 0. 

Note that if f is of class C* at a, then it is also of class C* at every point x 
sufficiently close to a—indeed, at every point x with |x — a| <r, with 7 as in 
Definition 6.1. On the other hand, suppose that f is analytic at a. It follows 
from Corollary 5.2 that f is C* at every point x with |x — a| <r, with r as in 
Definition 6.2. It is true, but not implied by Corollary 5.2, that f is analytic 
at every such point. 

It is obvious that the sum and product of C* functions is C® and almost 
obvious that the quotient is C* if the denominator is #0 at the point a. It is 
obvious that the sum of analytic functions is analytic, but it is not at all obvious 
that the product or quotient is. Both this question and the one raised in the 
last paragraph can be settled by the following theorem. 


Let f be of class C® at the point a. Then f is analytic at a tf and only if 
there exist positive numbers 6 and M such that 


[f*(x)| < M*k! for |x — al < 6. (2) 


Suppose first that f satisfies condition (2). We shall prove that f is 
analytic by using Taylor’s formula. The remainder term in Taylor’s 
formula is 


fo) 
(n+ 1)! 


The term on the right goes to 0 if M|x — al <1. Therefore, the Taylor 
series of f converges to f if |x — a| <1, where 


1 
= min {— 6). 
ae) 


Suppose that f is analytic and that 


(x — a)*"| < M*t|x — al**! for |x — a] < 6. 


{x)= > a(x — a)* for |x — al <1, (3) 
k=0 
and, therefore, that 
AG = > Qe oil) oo ae beens 
: k=m 

for |x — a] <r. (4) 
As in Section 5, take any r’ < rand thenr” between the two. Since 
the series (3) converges for |x — a] = r’’, the sequence {a;(r’’)*} goes to 0; 


Exercise 1 


analytic functions 109 


so there is a constant A such that 
lax(r’"’)*| < Aor lax| < A(r’’)-* for all k. 


If |x — al <1’, then by (4) 


lal ) k&-1) mt Ae) 


k=m 
If we put y = 7’/r”, this gives 


Lo] 


lO] < Avy Ye — 1) &— m+ tyt. 


k=m 


The sum on the right is known. It is just the mth derivative of the 
function g(y) = 1/1 —y, which is m!(1 — y)-™7. (Differentiate the 
Taylor series of g.) Therefore, 


eG) AG atl — ys eta! = an! for |x — al <7’, (5) 
if we choose M large enough so that 


Aes US) 5) (6) 


If A > 1, which plainly can be assumed, then (6) holds if 


A 


M = ———- 
GSS 


If f(x) = Dye ax(x — a)" for |x — a| <r, thenf is analytic at each point 
of the interval |x — al <r. 


Given a point x with |x — a| <r, choose 7’ with |x — al <r’ <r. Then 
use inequality (5) and the theorem. 


If f and g are analytic at a, then soisfg. If 


oO fe) 


f(x) = > a(x — a)* g(x) = » bi(x — a)! for |x — al <r, 
k=0 1=0 
then 


] nr 


f(xdg(x) = > Cn(x — a)” where Cx = » By bey. 


n=0 k=0 


110 6/sequences and series 


Note that on the one hand, ¢, is what it must be by virtue of Theorem 2.3 
of Chapter 5 on the nth derivative of the product fg. On the other hand, it is 
just what is obtained by multiplying the series together term by term and 
collecting together all the terms with a given exponent n. 


Proof Take any r’ <r and use (5) to find M such that 
[fm (x)| < Mmm! and le™(x)| < dm! for |x — al <1’. 


Theorem 2.3 of Chapter 5 gives 


k 


{ 
Orel =| YE peoee) 


< M*(k + 1)! 


for |x — al <1’. 
The remainder in Taylor’s formula for fg is 


CES) 
(n + 1)! 


(x = ayn 


< (n+ 2)M™Yx — alrt) for |x — al <1’. 


Several times we have seen that this converges to 0 if M|x — al < 1, 
which shows that the Taylor series of fg converges to fg on |x — a| < 6, 


i 
with 6 = min (= ’) 


This brings up an interesting point. On the one hand, the theorem shows 
that fg is analytic at every point x with |x — a] <r. On the other hand, the 
Taylor series of fg converges at every point x with |x — al <r. 


Exercise 2. Prove the last statement. 


This suggests a general theorem. 


THEOREM If f is analytic at every point of |x — a| <r and the Taylor series converges 
6.6 at every point of |x — a| <r, then the Taylor series converges to f at every 
point of |x — al <r. 


Exercise 3. Prove Theorem 6.6 by using Theorem 6.7 below. 


THEOREM Let f be analytic at each point of an open interval I. If f vanishes identically 
6.7 on some open subinterval, then f vanishes identically on I. 


Proof 


THEOREM 
6.8 


Proof 


analytic functions au 


Choose a point ¢ in the subinterval and let 6 = sup{y:f(x) = 0 for 
c <x <y}. What we have to show is that 0 is the right-hand end point 
of J. Then the same proof will show that a = inf{y: f(x) = 0 fora < 
x <c} is the left-hand end point, and the theorem will be proved. 

If } is not the right-hand end point of /, then f is analytic at }, so 


He) = a (x — b)* for |x — b| < 6 
k=0 


for some positive 6. On the other hand, since f vanishes identically 
between c and 8, it follows that all derivatives of f vanish at b, whence 
f(x) = 0 for b < x < b + 6, which is a contradiction. 


This whole business is rather slippery. Theorem 6.6 suggests a stronger 
theorem—if f is analytic at every point of |x — a| <r, then the Taylor series 
of f at a converges to f at every point of |x — a| < r—which is false. It is not 
hard to see (and follows from the next theorem) that 


is analytic at every point. On the other hand, its Taylor series 
2 (—1)*x?* 


at a = 0 has the radius of convergence 1. 

The explanation lies in the fact that power series should be considered in 
the complex domain, not in the real. Considering complex numbers x, as well 
as real ones, we see that f behaves badly as x > +7. The distance from +: to 
a = 0 is 1, and this is why the radius of convergence is 1. With the right 
theorems in the complex domain, the last several theorems, and the next one as 
well, become trivial. 


If f is analytic at a and f(a) # 0, then 1/f ts analytic at a. 


This theorem is included partly for the sake of the result and partly for 
the sake of the proof, which is typical of a number of proofs in analytic 
function theory. 


Let 
f@) = » ax(x — a)*. 
k=0 


It is clearly permissible to suppose that ap = f(a) = 1. (Prove this.) 
What is needed is a function g, analytic on some interval with center a, 


ETE: 


6/ sequences and series 


It is 


such that f(x)g(x) = 1 on some interval with center a. To get started, 
suppose that 


2 


g(x) = ) be — a)! (7) 


k=0 


is such a function. According to Theorem 6.5, it must be true that 


ee ee 
ae 0m ifn 0k 
k=0 


or, equivalently (taking account of the fact that a) = 1), that 


bo = i and Dn = = > an0n—k- (8) 

k=1 

Now start afresh and use formula (8) to define the sequence {d,}. 
Thisisaninductivedefinition. Itdetermines dy first. Andoncebo, ... , 
6,-1 are determined, it determines 6,. By virtue of Theorem 6.5, it is 
now entirely a question of proving that the series (7), with coefficients 
determined by formula (8), converges for some x ~ a. By Theorem 5.4, 
this is equivalent to proving that for some number NV 


[on] < NM" for all n. (9) 


Since the series for f does converge for some x ¥ a, there is a number 
M such that 
la,| < M for all n. (10) 


Let us try to prove formula (9) with N = 2M. By induction and (8) we 
have [if (9) holds for integers <n] 


n 


sii 
[dn| < > M*Nt-* = NN » Bi < N*, 


k=1 k=1 


which proves (9), and hence the theorem. 


In somewhat the same vein, there is a theorem on composite functions. 
stated as follows, but will not be proved, since there is no advantage in 


laboring things that become obvious once the right (complex) point of view is 
adopted. 


THEOREM 
6.9 


Tf gis analytic at a, and f is analytic at b = g(a), then the composite function 
h(x) = f(g(x)) is analytic at a. 


7 


examples iG} 


EXAMPLES 


The exponential and trigonometric functions can be defined easily by power 
series. ‘This avoids some of the sticky points about arc length, for instance, 
but it requires fairly substantial knowledge of series. 

Take first the exponential. Define 


Bey > a (1) 


The ratio test (Theorem 5.5) shows that the series converges for every x. 
Differentiation term by term shows that 


Hx) = os (2) 
The basic formula for the exponential is 
E(@)E(b) = E(a@+ 8). (3) 
To prove it, let f(x) = E(ax) and g(x) = E(bx). By Theorem 6.5, 


r _ i ee Cc) 
fixe) = y Pe ae where ¢, = » Gap =e ae = 
n=0 k=0 
so E(ax)E(bx) = E((a + 6)x), which gives (3) when x = 1. 

If x > 0, then each term in the series is >0, so E(x) > 0. On the other 
hand, (3) shows that E(—x) = 1/E(x), so E(x) > Ofor all x. Hence E’(x) = 
E(x) > 0 for all x, and £ is strictly increasing. 

Again, if x > 0, then each term in the series is >0, so the sum is larger than 
any one term; hence E(x) > x, and 


lim E(x) = o. 
The fact that E(—x) = 1/E(x) gives 
lim £(x) = 0. 


Consequently, E has a differentiable inverse L defined on 0 < y < © by 
L(y) = x if and only if E(x) = y. 
From (2) and the rule for differentiating composite functions it follows that 


1 
L'(y) =- 
y 


This should be enough to show how the development goes. 


114 


6/sequences and series 


Exercise 1 Define ¢ to be E(1) and show that E(x) = e* when x is rational. 


Remark 1 The function L is actually analytic at each pointy > 0. The relevant theorem 
is the following: 


THEOREM Let f be analytic at a, and let f'(a) # 0. Then f has an inverse which is 
TA defined on some interval with center b = f(a) and is analytic at b. 


Note that the existence and differentiability of the inverse are known 
already. Since f is analytic, the derivative f’ must exist on some interval with 
center a. Furthermore, it must be +0, since it is continuous and f’(a) ¥ 0. 
This implies that f is either strictly increasing or strictly decreasing, so the 
inverse exists. The differentiability follows from Section 1 of Chapter 3. Note 
also that the condition f’(a) # 0 is necessary for the existence of a differentiable 
inverse (differentiate the composite)—therefore, certainly for the existence of an 
analytic inverse. 

The fact that the inverse is analytic is another one of those problems that 
is easy from the complex point of view, and a nuisance at present. It can be 
solved along lines similar to those in Theorem 6.8, but we shall not do it. 


Remark 2 In proving formula (3) it would be more natural simply to multiply together 
the series for E(a) and E(b). So far we have considered the multiplication of 
power series but not of general series. 


THEOREM Tf a = Zy_g a and b = Diy by and the series converge absolutely, then 
Uo2s as < 
ab = » Cn where C, = » ab On—k- 
n=0 k=0 
Proof Let 


oO Lo] 


f@) = > ayx* 200 — 5 bya 1 > Cou 
k=0 


k=0 n=0 


According to Theorem 6.5, we do have that A(x) = f(x)g(x) for |x| < 1, 
and the whole point is to justify putting x = 1. First it must be shown 
that / is defined at x = 1. 


Exercise 2. Show that the series 2c, converges absolutely. (Hint: This is easy.) 


The theorem will be proved if it can be shown that f, g, and A are con- 
tinuous at x = 1: 


examples 


THEOREM If f(x) = Dfo ax(x — a)* converges absolutely for |x — al = 1, then the 
7.3 convergence is uniformon|x — a| < 1, sof iscontinuouson|x — al <r. 


Exercise 3. Prove the theorem. (Hint: This also is easy.) 


Now let us turn to the trigonometric functions. Let 


Go) = yo = 


CW) = > (ye 
k= 


115 


(4) 


The ratio test shows that both series converge for every x. Differentiation 


term by term shows that 
Sx) —sC() C’(x) = —S(x). 
Setting x = 0 in the series, we get 
S(0) = 0 c(0) = 
Consider the function f(x) = S(x)? + C(x)?. We have 
f’ = 28S’ + 2CC’ = 28C — 2CS = 0. 


(5) 


(6) 


Therefore, f is constant, and by (6) the constant must be 1. Hence we have 


the identity 
S(x)? + C(x)? = 1 for all x. 
It follows that 
ISG) |= 1 and |CG)| <1 for all x. 


(7) 


(8) 


Consider the Taylor series for § at an arbitrary point a and for x = a + A: 


This is valid for every a and every h, for by virtue of (8) and (5) the remainder in 


Taylor’s formula is 


Se Arti 
(n+ 1)! 


Artt 
———_— 0 
Gls 
According to (5), 


1) Ca) Sx) = (—1)'SG), 


(9) 


116 


6/ sequences and series 


Substituting these in the ag series, we find that 


S@ +h) = = ee + ase (-)S@ A 


io) arine (2k)! 


Factoring out C(a) and S(a) and looking at the two series, we observe the 
identity (addition formula for the sine) 


S(a + h) = S@)Ch) + C@)S(A). (10) 
Differentiate with respect to A to get 
Cla + h) = S’'(a+h) = C@C(A) — S(a)S(h). (11) 


Define the number 7 by 
; = inf{x:x > 0 and C(x) = 0}. (12) 
To do this we must show that there exists some x > 0 with C(x) = 0. If not, 
then C(x) > 0 for all x > 0. Now, (11) and (7) give 
C(2x) = C(x)? — S(x)? = 2C(x)? — 1, 
so if C(x) > 0 for all x > 0, then it follows that 


1 1 
C(x) > —=; hence S(x) > —=x, 
V2 V2 


which contradicts (8). 
Therefore, (12) makes sense and defines 7. From the definition it follows 
that 


cG)=0 and C(x) > 0 for 0 <x <F. (13) 


From this it follows that S@/2) = 1, and that S is strictly increasing on 0 < 
x <7/2. Hence S > 0 on thisinterval, and C’ = —Sis negative. Therefore, 
C is strictly decreasing. The behavior on the interval 7/2 < x < x can be 
deduced from this and (10) and (11), which give 


S (« + *) = OG.) G (: + | ie (14) 
In particular, S(r) = 0 and C(r) = —1; then 
Six + 7) = —S(x), C(x + 1) = —C(x), (5) 


and, finally, 
S(x + 2r) = S(x), Cia-- 27) — Ci) (16) 


Exercise 4 


Exercise 5 


Exercise 6 


8 


THEOREM 
8.1 


Proof 


Weierstrass approximation theorem ioe? 


As for the behavior for negative x, since the series for S$ has only odd 
exponents and the series for C' has only even ones, we have 


S(—x) = —S(x), C(—x) = C(x). (17) 


In particular, S is strictly increasing on —1r/2 < x < 1/2 and has a non- 
zero derivative. So it has an inverse function A, and A is also differentiable. 
Since S(A(x)) = x, differentiation gives S’(A(x))A’(x) = 1. We have to calcu- 
late S’(A(x)) = C(A(x)), which is easy because 


S(A(x))? + C(A(x))? = 1 or x? + C(A(x))? = 1, 
C > 0 on this interval, so C(A(x)) = V1 — x. Thus, 


1 
A’ (x) SS ——————— (18) 
V1 = x? 
Get back to the original definition of the sine and cosine by calculating a suitable 
arc length (by the formula of Section 7 of Chapter 4, of course). 


For any two numbers a and b witha? + 6? = 1, there is exactly one point x with 
cos x = a, sinx = b, O<x < 2r. 


[We now write sin x and cos x for §(x) and C(x). This exercise must be done 
entirely from the present point of view. No geometric intuition !] 


If sin y = sin x and cos y = cos x, then x and y differ by an integer multiple 
of 27. 


WEIERSTRASS APPROXIMATION THEOREM 


The power series that we have been studying provide one means of approxi- 
mating functions by polynomials (the Taylor polynomials). This kind of 
approximation is very special. It works only for functions that are analytic 
and, in particular, C*. Now we shall look at another approximation that works 
for all continuous functions. 


(Weierstrass Approximation Theorem) [If f is continuous on an inter- 
val I, there ts a sequence {fn} of polynomials such that f, — f uniformly on 
every bounded closed subinterval. 


To begin with (and this is the meat of the proof, the rest is easy) we shall 
suppose that f is continuous on the whole line and vanishes identically for 


118 


6/sequences and series 


LEMMA 
8.2 


|x| > 4. In this case we set 
Po{x) = ca(l — x?)*, 
where ¢, is chosen so that 
[i mo@) dx = 1. Gy) 
Then we define 


fal) = ["_ f0)o@ — 9) y = [7 fe — pO). 2) 


Note that the integrals are not really improper because f vanishes identi- 
cally for |y| > 4. To go from the first integral to the second, simply 
make the change of variable y = x — z. 


Each f,, is a polynomial, and f,— f uniformly on |x| < 4. 


From the first part of formula (2) it is plain that f, isa polynomial. Just 
multiply out (1 — (x — y)?)" and remove the powers of x from the 
integral. 

If |x| < 4, then the second part of formula (2) gives 


fax) = f°, $6 —y)paly) oy, (3) 


for f(x — y) vanishes unless |x — y| <4, hence unless |y| <1. This 
formula and (1) show that 


f0) — fale) = [2 LA) — fe — Mpa) &. (4) 


Let « > 0 be given, let M be the maximum of |/|, and choose 6 > 0 so 
that if |y| < 8, then |f(x) — f(@ — y)| <«. Then formula (4) gives 


Lf) — FOS fe OO + free, Moalo) & 


= e+ Mf 5<}yl<1 prly) dy. (5) 
Therefore, the whole problem is to show that 
ee pn(y) dy — 0 for each 6 > 0. (6) 


Take anyr,0 <r <1. By the definition of c, we have 


1 r 
== | (1 — x*)" dx > / Cee de = 2r (= 7 
=i 


Cn =F 


sO 


Exercise 1 


Exercise 2 


Exercise 3 


Weierstrass approximation theorem 11g 


Therefore, 


2\n Uae 
Fees tn by <5 <= =f. arg 


If we fix r < 6 and letn— ©, we see that (6) holds. 

Now we have the lemma, and we shall use it to prove the theorem. Sup- 
pose first that f vanishes for |x| >7. Then F(x) = f(2rx) vanishes for 
|x| > 4. For each e > 0 the lemma provides a polynomial P, such that 
|P.(x) — F(x)| < for |x| <4. Then Q.(x) = P.(x/2r) is a polynomial 
that satisfies |Q.(x) — f(x)| < € for |x| <r. 

Finally, let f be continuous on the open interval (a, 6). For each positive 
integer n, choose a continuous function ¢, on (a, 6) that is 1 on [a + 1/n, 
b — 1/n] and vanishes identically near aand 6. By what has been proved 
there is a polynomial f, such that |fn(x) — ¢n(x)f()| < 1/n on (@, 6), and 
the sequence {f,,} clearly does the job of approximating uniformly on every 
bounded closed subinterval. 


How do you finish up the argument if a = — © orb = o? 


How do you finish up the argument if the initial interval is not open but closed 
at one or the other end point? 


Draw pictures of the polynomials pf, and discuss why it is reasonable to expect 
that the f, in Lemma 8.2 should approximate f. 


In Section 12 of Chapter 7 we shall give a substantial generalization of the 
Weierstrass approximation theorem that can be used to find approximations 
by other kinds of functions than polynomials. 


parr II 


1 


DEFINITION 
1.1 


Exercise 1 
123 


Metric Spaces 


THE SPACE R’ 


The real n-dimensional space, called R”, is the set of all n tuples of real 
numbers. 


Geometrically, the one-dimensional space R! is the line, the two-dimen- 
sional space R? is the plane, and the three-dimensional space R?* is the three- 
dimensional space. In each case the identification of the geometric object with 
the set of real numbers, pairs of real numbers, or triples of real numbers pre- 
supposes that a coordinate system is given. 

There are three natural algebraic operations on the space R”. 


Welddition) If xe— (nip es 4 Xa) cy = Gig = in), ten 
oy = (xi Yi 8 ee ea) 
2. Scalar multiplication: If x = (x1, . . . , Xn) and a is a real number, then 
Axe — (Or eee 
3, Inner product: If x = (x1, . . . , Xn) andy = (yi, - ~~ , Yn), then 


n 


(x, 9) = » XE 
is 


These operations are natural in the sense that on the one hand they suggest 
themselves to some extent, and on the other hand (what is far more important) 


they have a geometric significance. 


Show that if x and y are points in the plane, then x + y is the fourth vertex of 


124 


7/metric spaces 


Exercise 2 


Exercise 3 


THEOREM 
1.2 


the parallelogram of which the other three vertices are x, y, and 0 = (0, 0). 
Find a geometric interpretation of ax. 


Show that if x and y are points in the plane, then (x, y) = |x| |y| cos 6, where 
|x| is the distance from x to 0 and 0 is the angle determined by the half-lines 
Ox and Oy. 


The same geometrical interpretations can be established in the three- 
dimensional space, but the calculations are more complicated. For spaces of 
dimension greater than three, the meaning of the phrase “geometrical signifi- 
cance” will have to be made clear. 

There is a common belief that spaces of dimension greater than three are 
illusions to which most mathematicians and occasional physicists like Einstein 
are subject. This is not quite correct. 

Consider the problem of describing the motion of the earth and the moon 
around the sun. If coordinates are chosen in the three-dimensional space with 
the origin at the sun, then the position of the earth is described by three coordi- 
nates and so is the position of the moon. The two together are described by 
six coordinates, that is, by a point in R®. The motion of the two bodies is 
described by a “‘curve”’ in R®. 


Discuss the description of a box full of gas (e.g., a boiler full of steam) and its 
behavior over an interval of time. 


Implausible as it sounds, this is one way these things are analyzed. 
Before going on, we record the following simple properties of the inner 
product, which can be established by inspection. 


(aye) = 0: 

(b) (x,y) = (y, *). 

(c) (ax, y) = a(x, y). 

(eG ary, 2) =z) 4, 2): 


The inner product actually satisfies a stronger condition than (a), which is 
(a) > Omumless' 0) == (0) a 0), (a’) 


Later on we shall come upon some inner products which satisfy (a) but not (a’). 

The inner product in R” is not a product in the usual sense of the word 
unless n = 1, for ifx © R* and y € R®, then the inner product (x, y) lies not in 
R" but in R!. In R? there does exist a very important natural product of the 
usual kind. 


DEFINITION 
1.3 


Exercise 4 


Exercise 5 


Exercise 6 


DEFINITION 
1.4 


THEOREM 
1.5 


Exercise 7 


the space R” 125 


If x,y © R?, then the product xy is defined by 
xy = (riya — Xe2yo, X21 a x12). 


Check that the usual rules of arithmetic hold for this product, that is, that 
eyes (yz) = (ay) z, and «(yp 2) Xe. 


Consider R! as a subset of R? by identifying the real number @ with the point 
(a, 0) in R?. (This just identifies the real line with the x; axis in R®.) Show 
that this identification is consistent with all four operations on R?. (If a and 8 
are real, you can form the sum in R! and then identify it with a point in R’. 
On the other hand, you can identify a and @ individually witn points in R? and 
then form the sum in R?. The problem is to show that the two procedures 
yield the same result, and so on.) 


More generally, consider R” as a subset of Rt! by identifying the point x = 
(x1, . . . , Xn) with the point (41, . . . , Xn, 0) in R*™*!. Show that this identi- 
fication is consistent with the three operations (a), (b), and (c). 


In situations where the multiplication on R? plays a role, R* is usually called 
the complex plane or the complex number system, and the points of R? are called 
complex numbers. 


The complex number (0, 1) is usually called7. According to the definition 
of multiplication and the convention of Exercise 5, we have 


22 = (—1,0) = —-1. (1) 
If x = (x1, x2) is any point of R?, then by Exercise 5 we have 
x = (x1, x2) = (1, 0) + (0, x2) = x1 + xoi. 
Thus, we get the following result. 
Every complex number z can be expressed uniquely in the form 
z= x-+ yl, where x and y are real andi? = —1. (2) 
Prove the uniqueness. 
This is the usual form in which complex numbers are written, rather than 


with subscripts x; + x22 or as pairs (x1, x2). The numbers x and yz are called 
the real and imaginary parts of z and are written Re z and Im z. 


126 


7/metric spaces 


Exercise 8 


DEFINITION 
1.6 


Exercise 9 


Exercise 10 


Remark 


Exercise 11 


Exercise 12 


The multiplication formula of Definition 1.3 seems mysterious. Show that it 
follows from formula (2) and the usual rules of arithmetic. [Thus, formula (2) 
is what should be remembered, not Definition 1.3.] 


The conjugate of the complex number z = x + yi, x and y real, is the 
complex number 


ZX — Yr, 
Geometrically, Z is just the reflection of z across the x axis. 


Show that 


ztw=Zi+ ib, zw = 2B, [z| = |z|, and 2z = |z|?. 


(Recall that if z © R%, then |z| is the distance from z to 0; that is, if z = x + yi, 
then |z|? = x? + y?.) 


Use the fact that zz = |z|? to show that every nonzero complex number has a 
reciprocal. Calculate (2 — 32)/(3 +2); that is, put it in the form x ++ yi with 
x and y real. 


It can be shown (not easily) that ifn > 2, there is no multiplication on R* with 
the properties listed in Exercises 4, 5, and 10. 


The complex numbers of absolute value 1 are precisely the ones on the 
unit circle, and hence precisely the ones of the form (cos 0, sin 6) = cos @ + 
isin 6. This representation is unique if @ is restricted to lie in the interval 
(0, 2m). If zis any complex number #0, then z/|z| has absolute value 1; so z 
can be written uniquely in the form 


z = r(cos 6 + isin 6), r> 0, 0 = 99 (3) 


The number r is just |z|. The number @ is called the principal value of the 
argument of z. 


If z = r(cos @+ isin 6),r > 0, thenr = |z|. Ifz+ 0, then @ differs from the 
principal value of the argument of z by an integer multiple of 27. (Any such 6 
is called an argument of z.) 


If z = r(cos @ + isin @) and w = s(cos ¢ + isin g), then 
zw = rs(cos(@ + y) + isin (6 + ¢)); 


hence 
zk = r*(cos k@ + isin ké), k an integer. 


Exercise 13 


Exercise 14 


DEFINITION 
2.1 


THEOREM 
2.2 


THEOREM 
2.3 


absolute value in R” 127 


Find the three solutions to the equation z* = 7. 


The inner product in R? is expressed in terms of the product by the formula 
(z, w) = Re zi, (4) 
and from this it follows that 


fiz, z) = 0. (5) 


Also deduce formula (5) from Exercises 2and 12. (Exercise 12 shows that zz is 
obtained from z by a rotation through a counterclockwise angle of 90°.) 


ABSOLUTE VALUE IN R’" 


It is possible to define an “‘absolute value” in R", which plays a role very much 
like that of the usual absolute value of a real number. In R? and R#® the 
geometric interpretation is that the absolute value is the distance from the point 
to the origin. 


Tf x = (x1, . . . , Xn), then the absolute value of x 1s the number 
kl=VG&o=N) Go? 
k=1 


The basic properties of the absolute value are as follows. 


(a) |x| > 0, and |x| = 0 only if x = 0. 
(b) |ax| = |a| |x| ¢f a ts a real number. 


© Paes Sew: 


As indicated already, the point 0 in R” is the point (0, . . . , 0). 

Properties (a) and (b) are obvious, but property (c) isnot. It is based on 
the following theorem, which is famous in its own right and is called the Cauchy— 
Schwarz inequality. Cauchy is the same fellow that appeared on the scene with 
Cauchy sequences. Schwarz is another fellow. The Russians call it the 
Buniakowski inequality. 


(Cauchy-Schwarz Inequality) If (x, y) is an inner product and |x| = 
V(x, y), then 


Koy} = Iai 


The theorem is put this way so that it can be used not only for the inner. product 
and absolute value in R*, but for any inner product satisfying the conditions 
(a) through (d) of Theorem 1.2. 


128 7/metric spaces 


Cauchy-Schwarz 


Proof If wis any real number, then according to (b), (c), and (d) of Theorem 1.2, 
we have 


(x + ay, x + ay) = (x, x) + (x, ay) + (ay, x) + (ay, ay) 
= kPa Zany) aly. 


Therefore, according to (a), 
Ix[? + 2a(x, y) + ay|? > 0 for all real a. (1) 
If we assume that (y, y) # Oand take a = —(x, y)/(y, y) in (1), then we get 
Bh in 
Iy|? 
which is just what is needed to prove the theorem. 


Exercise 1 Go back to formula (1) to show that if (y, y) = 0, then (x, y) = 0, so that the 
theorem holds in this case, too. 


Exercise 2. The mysterious value of a that was used in the above proof is simply the one 
that minimizes the function 


fla) = |x|? + 2alx, y) + a*| yl? 


In order to establish condition (c) in Theorem 2.2 we use the Cauchy— 
Schwarz inequality as follows: 


[x + y|? = (x + y, x + y) = (x, x) + 2(x, y) + (y, 9) 
S |x[? + 2fx] [y] + lvl? = Cel + yl)? 


Then take the square root on both sides. 


5 


DEFINITION 
3.1 


THEOREM 
Bol 


Proof 


DEFINITION 
3.3 


DEFINITION 
ot 


THEOREM 
3.5 


metric spaces 129 


METRIC SPACES 


The best way to discuss convergence and continuity in higher-dimensional spaces 
is to do it abstractly. Vhere are two advantages—generality and simplicity. 
The results apply not only to the plane and the three-dimensional space, which 
are the guiding examples, but to R” and many other interesting situations. 
And even in the three-dimensional space, the abstraction tends to make the 
notation simpler and the ideas clearer. What is needed is a notion of distance. 


A metric space is a set X on which there is a distance subject to the following 
conditions: 


GieGae— 0 and dizny) = 07) 4 = 4. 
Mb) ea, <) aay). 
(c) d(x, z) < d(x, y) + d(y, z) (triangle inequality). 


Of course, d(x, y) denotes the distance from x to y. It is remarkable that such 
simple conditions are adequate to develop the elementary properties of con- 
vergence and continuity. 

These conditions are geometrically obvious in the case of the line, the plane, 
or the three-dimensional space. ‘The last one, for instance, can be read as saying 
that the distance from a point x straight to a point z is at most as long as the 
distance around by way of a point y. 


R” is a metric space if the distance is defined by 


d(x, y) = |x = y|. 


Each of the conditions (a), (b), and (c) follows from the corresponding 
one in Theorem 2.2. 


It is plain how to define convergent sequences and continuous functions 
on metric spaces. 


Xn —> xin the metric space X if for each positive number e¢ there ts a positive 
integer no such that if n > no, then d(xn, x) <«. 


A function f from a metric space X to a metric space Y is continuous at a point 
a © X tf for each positive number € there is a positive number 6 such that tf 
d(x, a) < 6, then d(f(x),fla)) <«. f ts continuous on X, or simply 
continuous, if zt is continuous at every point. 


The function f ts continuous at the point a tf and only if tt has the following 
property: If xn—> a, then f(xn) > f(a). 


130 


7/metric spaces 


Proof 


Exercise 1 
Exercise 2 


Exercise 3 


4 


Exercise 1 


THEOREM 
4.1 


Let f be continuous at a, let x, — a, and lete > 0 be given. Choose 6 in 
accordance with the definition, and choose no so that if n > mo, then 
d(Xn, a) < 6; hence d(f(xn), f(a)) <. Since this can be done for every 
positive e, it follows that f(xn) > f(a). 

Now suppose that f is not continuous at a. Then there is some 
positive efor which thereisno 6. In particular, for each positive integer n, 
there is a point x, satisfying 


d(xn, @) <- and = d(f (xn), f(a)) >. 
(Otherwise 1/n would be a 6!) It is plain that x, — a, but f(x») 7 f(a). 
Show that a sequence cannot converge to two different points. 
Define a Gauchy sequence in a metric space. 


Define a uniformly continuous function. 


FUNCTION SPACES 


The best models to keep in mind when thinking about metric spaces are the 
plane and the three-dimensional space. It is important, too, to realize that 
there are metric spaces quite different from these. The following ones are very 
important in their own right. 

Let 7 be any set, and let ®(/) be the set of all bounded real-valued functions 
on /Z. An “absolute value” can be defined on @(/) by the formula 


ll*ll = sup {lx(@|:¢ © J}, (1) 
and then a distance by the formula 
D(x, y) = |lx — yl. (2) 


Show that the absolute value just defined satisfies the three conditions in 
Theorem 2.2, and then that the distance satisfies the conditions in Definition 3.1. 


Convergence in the space @(J). is an old friend (a wolf?) in new clothing. 
Inspection of the definition shows that it is nothing but uniform convergence of 


functions. In these terms Theorem 4.2 of Chapter 6 reads 


In the space @(1) every Cauchy sequence converges. 


THEOREM 
4.2 


THEOREM 
4.3 


DEFINITION 
4.4 


Exercise 2 


Exercise 3 


Exercise 4 


Exercise 5 


function spaces 13I 


It is clear that any subset of a metric spaceisa metric space. The distance 
is already defined! Let J = [a, 6] be a closed bounded interval, and let ®(/) 
and @(J) denote the Riemann integrable functions and the continuous functions 
onZ. Both are subsets of ®(/), so both are metric spaces. Combining Theorem 
4,1 with Theorem 4.3 of Chapter 6 we get 


In the space C(I) every Cauchy sequence converges. 


Combining Theorem 4.1 and Theorem 4.4 of Chapter 6 we get 


In the space @(I) every Cauchy sequence converges. Moreover, the function 


Se = ia RO ae 


1s continuous. 


In the second part of the theorem we are looking at S as a function from 
the metric space ®(Z) to the metric space R? (the line) and are using Definition 
3.4. The assertion depends on the sequential characterization of continuity 
given in Theorem 3.5. 


A metric space is complete if every Cauchy sequence converges. 


Show that an open interval is not complete. 


7 


Show that the rational numbers are not complete. ' 14 1,41 hi#lh) .~ —> V2 f GD 
v ; 


There is an entirely different way to define a metric or distance on the 
space @(Z) that is also very important. First, an “‘inner product” is defined by 
the formula 


b 
x9) = fo xy dt, (3) 
and then an absolute value by the usual formula 
kl = Ve»), (4) 
and finally a distance by the usual formula 


d(x, y) = |x — yl. (5) 


Show that this inner product and absolute value satisfy the conditions in 
Theorems 1.2 and 2.2. Hence the Cauchy—Schwarz inequality holds, and the 
distance is indeed a distance in the sense of Definition 3.1. 


With this distance the space @(Z) is not complete. 


132 


7/metric spaces 


DEFINITION 
Sol 


The space @(/) with this distance resembles the spaces R" very closely in 
many respects. It is considered their closest “infinite-dimensional” analog. 
However, the fact that it is not complete is a serious disadvantage. This can be 
remedied by the addition of certain discontinuous functions which are “limits” 
of Cauchy sequences that do not have continuous limits. One might think, for 
example, of adding all Riemann integrable functions. These have to be added, 
but still the space [now R(/)] is not complete. More complicated functions 
than the Riemann integrable ones must be used. 


EQUIVALENT METRICS 


We have just seen an example of two metrics on the space @(J) which are 
quite different. Not only are the actual numbers D(x, y) and d(x, y) different 
for given points x and y, but the Cauchy sequences, the convergent sequences, 
the continuous functions, and so on, are quite different in the two cases—as can 
be seen from the fact that @(Z) is complete with the metric D, but is not 
complete with the metric d. Now we shall look at the opposite situation to see 
when two metrics, although they may be different, must produce the same 
Cauchy sequences, the same convergent sequences, the same continuous func- 
tions, and so on. 


Two metrics d and D ona set X are equivalent if there exist positive numbers 
m and M such that 


moO y) = d(x, y) = MD(x, y) for all x, y EX. 


Two absolute values are equivalent if there exist positive numbers m and M 
such that 


m||x\| < |x| < M|jx|| HO CU NS 


It is plain that if two absolute values are equivalent, then the corresponding 
metrics are equivalent. It is also plain that if two metrics are equivalent, then 
they do produce the same Cauchy sequences, the same convergent sequences, 
and the same continuous functions. (The converse of this remark is almost, 
but not quite, correct. Can you give an example on X = [0, 1]?) 

Consider R” with its initial absolute value defined by 


Re Vem ye ay) 


and with a new one defined by 


I 
— 


es | 2 (2) 


I] = max {| 2h 


Exercise l 


Exercise 2 


THEOREM 
5.2 


Proof 


Exercise 3 


Exercise 4 


Exercise 5 


equivalent metrics 183 


Verify that ||x|| is an absolute value and that 
lll < lel < Va [ell (3) 


Now, let us consider R” from a different point of view. An n-tuple of real 
numbers is (by definition, in fact!) a real-valued function on the set 


a Te ees 7 


In general, an n-tuple in a set XY is a function from /, into X. ‘The notation is 
like the notation for sequences. If x is an n-tuple, then x, is usually written in 
place of x(k). 

From this point of view, R” is exactly the same set as ®(U,). Moreover, 
the addition and scalar multiplication in R” are the same as those in @(/,). 


Check the last statement. 


The absolute value defined in formula (2) is just the absolute value on @(/,). 
Ail of which gives a theorem. 


R” zs complete. 


By Theorem 4.1, ®(/,) is complete with the metric (2), and by formula (3) 
the metrics (1) and (2) are equivalent; so they have both the same Cauchy 
sequences and the same convergent sequences. 


The fact of the matter, and we shall prove it in Section 10, is that any two 
absolute values on R” are equivalent. This is useful to know, although in specific 
cases the equivalence is usually easy to prove, just asit was above. 


Show that 


lIx|]1 = Y |x| 


is an absolute value on R”, and that it is equivalent to the two already defined. 


The notion of equivalent metrics certainly is not necessary just to prove 
that R” is complete. It is a notion that is needed later, and it does give the 
nice proof above. However, a proof can also be made along the following 
lines. It will simplify the notation (avoid double subscripts) to write the 
n-tuples as functions on J,. 


A sequence {x,} in R”is Cauchy or convergent if and only if each of the sequences 
ait leew, tas the same property. 


Use Exercise 4 and the fact that R! is complete to show that R” is complete. 
Pp Pp 


oe 


7/metric spaces 


0 


DEFINITION 
6.1 


DEFINITION 
6.2 


Exercise 1 


THEOREM 
6.3 


Proof 


OPEN AND CLOSED SETS 


By analogy with everyday terminology in R’, balls and spheres are defined as 
follows: 


In a metric space X the open ball with center a and radius r > O ts the set 
Gp) Ca) Re 7. 

The closed ball with center a and radius r is the set 
Bla; r) = {x:d(x, a) <r}. 

The sphere with center a and radius r is the set 
as a) = eG a) = 


It is always assumed, unless the contrary is stated explicitly, that the radius of a 
ball or sphere is positive. 


In R? the open and closed balls with center a are the open and closed 
intervals with center a. 


A set G C X ts open in X if for every point a € G there 1s some ball with 
center a thatiscontainedinG. AsetF C X isclosed in X if its complement 
A — F as open in X. 


Another way to state the definition is that a set G C X is open in X if for 
every point a € Gthereisa positive number 6 such thatifx € Xand d(x, a) < 4, 
then GG. 

Usually a set is called simply open or closed, rather than open or closed 
in X, but the latter is always understood. A set is never open or closed on its 
own, but only open or closed in some given metric space X. For example, an 
open interval in R' is an open set in R', but is not an open set in R? when we 
consider R! C R?* in the usual way. 


In any metric space X, the empty set and _X itself are both open—and hence 
both closed. 


The open ball 1s an open set. The closed ball and the sphere are closed sets. 


Let b be any point of the open ball B(a, r), and let 6 = r — d(b, a)—which 
is positive by definition of the open ball. If d(x, 6) < 6, then 


d(x, a) < d(x, 6) + d(b, a) <r. 


Exercise 2 


THEOREM 
6.4 


Proof 


THEOREM 
6.5 


Proof 


open and closed sets 135 


Thus, B(b, 6) C Bla, r), and B(a, r) is open. 

Let 6 be a point in the complement of the closed ball B(a,r). Then 
5 = d(b,a) —r is positive, and if d(x, 6) < 6, then, since d(a, b) < 
d(a, x) + d(x, 6), we have 


d(a, x) > d(a, b) — d(x, 6) > r. 


Thus, B(d, 6) is contained in the complement of B(a, r), which shows that 
the complement is open, and hence that B(a, r) itself is closed. 


Do the case of the sphere yourself. 


The theorem relating continuous functions to open and closed sets is the 


following. 


A function f from X to Y is continuous if and only if it has the property that 
f-(G) is open in X whenever G is open in Y. 


When G is any set in Y, the set f-1(G) is the set in X defined by 
7 (G) = {axe S X and jo) SG}. 


Suppose that f is continuous, let G be an open set in Y, and let a be any 
point in f-1(G). Then b = f(a) © G, and since G is open, there is a 
positive number e such that if d(y, b) < ¢,theny @ G. Use the continuity 
of f to find a positive number 6 such that if d(x, a) < 6, then d(f(x), b) < ¢; 
hence f(x) € G; hence x € f-1(G). Then B(a; 6) C f-(G), so f1(G) is 
open. 

Suppose that f has the property indicated in the theorem, let a be 
any point of X, and lete > 0 be given. By Theorem 6.3, G = B( f(a); &) 
is open, and, therefore, so is f—1(G). Hence, there is a 6 > 0 such that if 
d(x,a) <6, then x €f-'(G); hence f(x) CG = B(f(a); e); hence 
d( f(x), f(a)) < ¢«. This shows that f is continuous at the arbitrary point a. 


The same kind of theorem holds for closed sets. 


A function f from X to Y is continuous if and only if it has the property that 
f- (PF) ts closed in X whenever F is closed in Y. 


This follows from Theorem 6.4 and the simple identity 
[AOE =) a 


136 


7/metric spaces 


Remark 


THEOREM 
6.6 


Proof 


THEOREM 
6.7 


Proof 


This theorem can be used to prove that the closed ball and the sphere are 
closed sets in the following way. First we show that the function f(x) = d(x, a) 
is continuous from X to the real numbers. The triangle inequality gives 
d(x, a) < d(x, y) + d(y, a); hence 


d(x, a) — d(y, a) < d(x, y). 


Interchange of x and y gives 


d(y, a) — d(x, a) < d(y, x) = d(x, y), 


and the two together give 


|a(x, 2) — d(y, a)| < d(x, y). (1) 


This shows that f is continuous: Given an ¢, we can take 6 = e. 

Now, B(a;r) = f-'(F), where F is the closed interval [0, 7], and S(@;7) = 
f\(F), where F is the single point {r}. It is easy to see that both [0, r] and {r} 
are Closed in R!, so Theorem 6.5 shows that the closed ball and the sphere are 
both closed. 

The same kind of argument could be used to show that the open ball is 
open, except that this fact was used in the proof of Theorem 6.4. 

Here is a direct characterization of closed sets. 


A set F © X is closed in X if and only if it has the following property: 
If xn — x and each x, is in F, then x € F. 


Suppose that Fis closed and that x, — x,witheachx, € F. Ifx © X — F, 
then since X — Fis open, there isa ball B(x; r) C X — F—whichis clearly 
impossible, since x, belongs to any such ball if n is large enough. 

Now suppose that Fis not closed, so that XY — Fis not open. Then 
there is some point a € X — F such that every ball B(a; 7) intersects F. 
In particular, for each positive integer n, there is a point x, © FC) 
B(a;1/n). Clearly x, — a, each x, is in F, but a is not in F; so F does not 
have the property described in the theorem. 


This characterization of closed sets suggests a relation between closed sets 
and complete metric spaces. 


Tet X C Y. If Xs complete, then X is a closed subset of Y. If X isa 
closed subset of Y and Y is complete, then X is complete. 


Suppose that X is complete, and let x, — y with each x, © X. Then the 
sequence {x,} is Cauchy, and since X is complete, there isa point x C Y 


COROLLARY 
6.8 


Exercise 3 


Exercise 4 


Exercise 5 


Exercise 6 


Exercise 7 


Exercise 8 


Exercise 9 


Exercise 10 


open and closed sets Hoe 


with x, x. However, a sequence cannot converge to two different 
points, soy = x © X. This shows that X is closed in Y. 

Suppose that X is closed in Y and that Y is complete, and let {xn} be 
a Cauchy sequence in X. Then {x,} is also a Cauchy sequence in Y, so 
x, —y for some y € Y. Since X is closed in Y, it follows that y € X. 
Therefore, every Cauchy sequence in X converges in X, and X is complete. 


The closed ball and the sphere in R” are complete. 


The union of any number of open sets is open. The union of a finite number of 
closed sets is closed. 


The intersection of any number of closed sets is closed. The intersection of a 
finite number of open sets is open. 


Show by example (e.g., in R!) that the assertions of Exercises 3 and 4 cannot be 
improved. 


The closure A of a set A C_X is the intersection of all closed sets that contain A. 
Show that A is closed and that it consists of all x © X such that a, — x for some 


sequence {a,} in A. Show that A = A. 
AUB=AWUB. Intersection, too? 


A set A C X is nowhere dense in_X if A contains no ball. Show that the union 
of a finite number of nowhere dense sets is nowhere dense. 
For any A C X, A not empty, define 
d(x, A) = inf{d(x, y):y € A}. 
Show that d(x, A) — dy, A)| < d(x, y), and hence that d(x, A) is continuous. 
Show (hate) — = .a(74) — OF. 
If /) and F, are two disjoint closed subsets of a metric space X, then there 
is a continuous real-valued function f on X such that 
7 —0on hf — lon, 0 = fG@) < 1 for all x C X. 
(Hint: Try 


fe) = a | 


Hes, ja) a= CRY 


138 7/metric spaces 


7 


Exercise 1 


DEFINITION 
pall 


DEFINITION 
Uh 


DEFINITION 
7.3 


THEOREM 
7.4 


Proof 


CONNECTED SPACES 


It must not be imagined that the subsets of a metric space are divided neatly 
into the open ones and the closed ones. Most sets are neither open nor closed, 
and some sets are both. 


A metric space X is always both open and closed in itself. The empty set is 
always both open and closed in X. 


A metric space X is connected if no subset is both open and closed, except X 
ttself and the empty set. If X is not connected, then it is disconnected. 


The definition can be rephrased in two ways by making use of the fact 
that a set is open if and only if its complement is closed. 


A metric space X is disconnected if and only if there exist nonempty open sets 
G, and G2 such that 


Cyc. — 2. Gi Ge = 2 
(where & is the empty set). 
A metric space X is disconnected if and only if there exist nonempty closed sets 
F and F», such that 

ETON EG I), Ae a — axe 


A subset X of R' is connected if and only if it is an interval. 


If X is not an interval, then there exist real numbers a, b, and ¢ such that 
a < 6 < ¢, while a and ¢ belong to X and b does not. Define Gj and G2 by 


G, = {xix © Xand x < 5}, G2 = {x:x GC Xand x > d}. 


It is immediately verified that G, and G; are open, and it is plain from the 
definition that they are nonempty and satisfy G)(\ G; = @ and G, U 
G2 = X. Therefore, X is disconnected. 

Now suppose that X is an interval, but that XY is disconnected. Let 
F, and F» be as in the third form of the definition, and let a € F, and 
cE Ff, witha <c. (Ifa > c, change the notation.) Let 


b = sup{x:x © F, and x < c}. 


We shall get a contradiction by showing that 6 € Fy \ F2, which is 
supposed to be empty. 


Exercise 2 


THEOREM 
7.5 


Proof 


Exercise 3 


DEFINITION 
7.6 


DEFINITION 
Uoll 


THEOREM 
7.8 


Proof 


connected spaces 139 


By the definition of the least upper bound, we can find for eachna 
point x, © F; satisfying b — 1/n < xn <b. Thus, x,— b and *, € Fi, 
so b must belong to Fj, since F is closed in X. 


The last statement holds water only if b © X—for F is closed in X, not in R'. 
How do we know that ) © X? 


Now, let us show that) C fF). Ifb =c, we are done. Otherwise ) < c, 
and, if n is large enough, then y, = b+ 1/n is also <c, hence in X. By 
the definition of 5, y, cannot be in Fj, so it must be in Fe. We have 
yn— b and y, € Fe, and since F2 is closed in X, it follows that b € Fy. 


If f is a continuous function from X to Y, and X is connected, then f(X) is 
connected. 


It is clear that f is a continuous function from X to f(X), so there is no loss 
in generality in supposing that Y = f(X). If Y is disconnected, then 
Y = G,U Ge, where G; and G, are open, etc. Then X = f-4(Gi) U 
f-(G2), where f-1(Gi) and f—1(G2) are open, etc. 


Verify the etc. 


Theorems 7.4 and 7.5 give the general version of the basic theorem in 
Chapter 3 that if f is a continuous real-valued function on an interval J, then 
f(D) isan interval. Indeed, Theorem 7.4 says that J is connected, ‘Theorem 7.5 
says that f(Z) is connected, and then Theorem 7.4 says that f(/) is an interval. 

The geometric sense of disconnectedness is that the space splits into two 
parts that are somehow separated from one another. There is another kind of 
connectedness that is perhaps more intuitive and is easier to handle in many 
cases. 


A path, or curve, or arc in a metric space X 15 a continuous function y from a 
closed bounded interval [c,7] into X. The points p(c) and ¢(r) are called 
the initial and final points of the path, and the path is said to join these two 
points. 


A metric space X is path connected if any two points can be joined by a pathin X. 
Every path-connected space 1s connected. 


Let G; and Gz disconnect X, choose a and b in G; and G2, and let ¢ be a 
path joining a and b. Then g~1(Gi) and g (G2) disconnect the interval 


140 7/metric spaces 


DEFINITION 
7.9 


Exercise 4 


THEOREM 
7.10 


Proof 


Exercise 5 


Exercise 6 


Exercise 7 


Exercise 8 


THEOREM 
eli 


[o,7] on which ¢ is defined, which is impossible, since an interval is 
connected. 


The simplest kind of path in R® is of course the straight-line segment. 


The line segment joining the points a and b of R is the path ¢ defined by 
ge(t) = (1 — dat tb ie DSS 1s 


Show that the line segment is a path, that is, is continuous. 


In R® the closed and open balls are path connected. Indeed, the line segment 
joining any two points of the ball is a path in the ball. 


Let a and 6 be two points of the ball B(c; r) and ¢ be the line segment 
Joining them. What has to be shown is that g(t) © B(c;r) for each t. 
Writing c = (1 — #c + tc, we have 


le@ —e =|0-—det+b—d <4 —dle—cl + to —c 
<d—-adrt+ee=r. 


The same kind of proof works for the closed ball. 


A rectangle in R” is a set R of the form 
ea ator — ly gees et 


where a = (a1, . . . ,@n) and 6 = (bi, . . . , 6») are any two points of R™ with 
ai < 6; for each 7. Show that a rectangle is path connected by showing that 
the line segment joining any two points of the rectangle is a path in the rectangle. 


The points a and 6 in the rectangle above are the “lower left”? and “upper right” 
hand vertices of the rectangle. What are the other vertices? What is the 
center? 


Let R& be a rectangle in R" with center c. Define an absolute value on R* so 
that R = B(c; 1). Use this to do Exercise 5. 


We have been talking about closed rectangles. Define open rectangles, and 
prove that open rectangles are open and that closed rectangles are closed. 


In R*, n > 1, the sphere S(a; r) is path connected. 


Proof 


Exercise 9 


Exercise 10 


connected spaces I4l 


To simplify the notation we shall treat the sphere S = $(0; 1). Leta and 
b be any two points of S, and suppose first that a and 6 are not diametrically 
opposite; that is, that a # —b. The line segment ¢ joining a and 3 is 
a path in R® all right, but it is not a path in §. However, we can 
“project” it on S in the following way. Define 


ge)  (W—d)at th 
le@| |G — dat | 
It is geometrically clear that the line segment from a to } does not pass 
through 0 [i.e., that the denominator in (1) is #0] if a and 5 are not dia- 


metrically opposite, but let us prove it. If (1 — #)a = —¢#b, then, since 
(al =ands|6|)="1, 


va = 


(1) 


1—¢t= |(1 — dal = |—#b| = 2. 


Consequently, ¢ = 3 and ga = —3b, soa = —b. 

This shows that formula (1) makes perfectly good sense, but it remains 
to show that Yiscontinuous. First of all, the function |¢(¢)| is continuous, 
since it is the composite of |x| = d(x, 0) and ¢, which are both continuous 
(see Section 2). Since |¢(¢)| is continuous and +0, it follows that a(t) = 
1/|e(@®| is also continuous. The proof is finished by the following 
exercise, which is entirely similar to the familiar theorem on the product 
of two continuous real-valued functions. 


Let ¢ be a continuous function from a metric space X to R", and let a be a 
continuous real-valued function on X. Then the product ¥(¢) = a(é)y(é) is 
continuous from X to R*. 


It remains to treat the case when a = —b. But all we have to do is to 
take any third point c, and join a toc and thenctob. This is the point 
where the proof breaks down forn = 1. Whenn = 1, the sphere S(0; 1) 
consists of the two points x = +1, which are diametrically opposite, and 
there is no third point to make use of. 


In all these examples we have proved that the space is path connected, and 
we know that every path connected space is connected. The question arises as 
to whether there are connected spaces that are not path connected. In fact, 
there are, but examples are not so very easy to produce. Here is one in the 
plane. 


1 
~~ — 1G.) Ol = Vandy — sin or x — 0 and -1<y<i}. 
x 


Draw a picture of the set X and show that it is connected but not path connected. 


142 7/metric spaces 


THEOREM 
7.12 


Proof 


Exercise 11 


Exercise 12 


Exercise 13 


Exercise 14 


Exercise 15 


For open subsets of R", however, connected and path connected are the 


same. 


If Gis a connected open subset of R”, then any two points of G can be Joined by 
a polygonal line in G. j 


A polygonal line is a finite sequence of line segments, each one beginning 
where the previous one stopped. This is plainly a path. Let a be a 
fixed point of G, and let G, be the set of points in G to which a can be 
joined by a polygonal line in G, and Gz be the set of points in G to which a 
cannot be joined by a polygonal line in G. We shall show that Gy and G, 
are both open. Let 6 € Gi, and choose 6 > 0 so that B(b; 8) CG. Ife 
is any point of B(b; 6), then the polygonal line from a to 6 (which exists 
because 6 € Gj) followed by the line segment from 6 toc gives a polygonal 
line from a toc. Hence c € G;. The same idea shows that G» is open. 
If b € Ga, let 6 > 0 and c be as before. If there were a polygonal line 
from a toc, then following it by the line segment from c to b we would have 
one from a to b. 

Since G; and G; are both open, and since plainly their union is G and 
their intersection is empty, it follows that one of them must be empty. 
The empty one is not Gi, for a € G;. Thus Go is empty, and so G; = G, 
and we are done. 


Why are we done? 


Let 5 be a family of connected subsets of a space X. 


(a) If all the sets in § have a common point, then the union is connected. 
(b) Ifall the setsin ¥ intersect some one of them, then the union is connected. 
[Hint: Part (b) follows immediately from part (a) by a little trick.] 


For each x € X let C(x) be the union of all connected subsets of X that contain x. 
@(x) is connected. If x and y are two points, then either @(x) = C(y) or 


Ce CU) =o 


If X is an open subset of R”, then each C(x) is open in R*. [Note that @(x) is 
formed relative to X, not relative to R®.] 


The sets @(x) are called the connected components of the metric space X. 


What are the connected components of R® — S(a; r)? 


composite functions and subsequences 143 


Exercise 16 Every open set in R! is the union of a disjoint sequence of open intervals. 
(Hint: The open intervals are the connected components. Why can they be 
arranged in a sequence?) 


8 COMPOSITE FUNCTIONS AND SUBSEQUENCES 


If f is a function from a set X to a set Y, and g is a function from FY to Z, then the 
composite is the function 4 from X to Z defined by 


A(x) = g(f()). 


Usually the composite is designated by gof. There is a notation that is con- 
venient in general as an abbreviation, and is particularly convenient in chasing 
composite functions around. To say that f is a function from X to Y, we write 


fX53¥ or XY. (1) 
The setup that produces a composite function is then 
HSV (2) 


There is one unfortunate aspect of the notation. The setup (2) produces the 
composite function ge f, not fog. That is, the order is reversed. There are 
ways to avoid this, but the notation is so well established that they are impractical. 


f a : : : : : 
THEOREM Let X > Y—Z. If f ts continuous at a point a, and g 1s continuous at 
8.1 b = f(a), then g of is continuous at a. 


It is assumed tacitly, of course, that X, Y, and Z are all metric spaces. 


Proof Let « > 0 be given. First use the fact that g is continuous to choose 
5; > 0 so that if d(y, 6) < 4), then d(g(y), g(b)) <«. Then use the fact 
that f is continuous to choose 5 > 0 so that if d(x, a) < 6, then d(f(x), 
f(a)) < 61. Now, if d(x, a) < 6, then 


d(gof(x), g°f(a)) <«. 


A special kind of composite function is of particular interest in the next 
section. 


DEFINITION Let x be a sequence ina set X. A subsequence of x 1s a composite function 
8.2 x ok, where k ts a strictly increasing sequence of positive integers. 


EE: 


7/metric spaces 


THEOREM 
8.3 


Exercise 1 


THEOREM 
8.4 


Proof 


Exercise 2 


Exercise 3 


THEOREM 
8.5 


Proof 


Remember the definitions: A sequence in a set X is a function from the 
positive integers into X. Therefore, if N denotes the set of positive integers, 
then the situation is that 


i a 
N— Nx 
so xo k:N-— X is again a sequence in X. 
A subsequence is simply a rule that picks out some of the terms of a sequence 


(an infinite number, of course) in their proper order. The usual practice is to 
write x», for the point x(k(’)) = x k(2), and to write {xz,} for the subsequence. 


Tf {xn} converges to a, then every subsequence converges to a. 


Prove the theorem, and discuss its relation to Theorem 8.1. 


Let {xx} be a bounded sequence of real numbers. Then there is a subsequence 
that converges to the limit superior. 


The definition of the subsequence is an inductive one based on Lemma 3.2 
of Chapter 6. First choose k; > 1 so that |x., — b| < 1, where 0 is the 
limit superior. This involves using the lemma with 6 = 1. Next choose 
ke > ky with |x, — 6| < 1/k:. This involves using the lemma with 
6 = 1/k;. Suppose that ki, ke, . . . , km are already chosen, and choose 
km+1 > km so that |xz,,, — 6] < 1/km—which involves using the lemma 
with 6 = 1/km. By the construction, the sequence {k;} is strictly increas- 
ing, so the sequence {x,,} is a subsequence. It converges to 8, since 
eet 
|i, b| a a EG R 
Show that if {A;} is any strictly increasing sequence of positive integers, then 
k; 21. (This is what is used in the last inequality above and in Exercise 1.) 


Let {xz} be a bounded sequence of real numbers. There is a subsequence that 
converges to the limit inferior. Furthermore, if a is the limit of any convergent 
subse ence, then 

lim inf x, < a < lim sup xg. 


Every bounded sequence in R” has a convergent subsequence. 


4 


A set X C R* is bounded if there is a number M sueh that |x| < M for 
every x € X. A sequence {x,} is bounded if |x,| < M for every k. 


DEFINITION 
9.1 


THEOREM 
ON 


Proof of (a) 


compact spaces 145 


To avoid double subscripts we shall write a point x € R” as x = 
CG pmewheness — (ie) 4 %,-1) and 7 — x5, im whichicase a 
\x’|? + #2, It is immediate that x, — x if and only if x,— x’ and t,t. 

Now let {x,} be a bounded sequence in RR”. Then {x,} is a bounded 
sequence in R*!, and by induction there is a subsequence {x,,} that 
converges to x‘@R*!. (Theorem 8.4 starts the induction.) The 
sequence {f,,} is a bounded sequence of real numbers, so by Theorem 8.4 
there is a subsequence {ti} that converges tot—> R!. Now, the sequence 
{xx,,} is a subsequence of {x,} that converges to (x’, é), for t., > & by 
definition, and Rie — x’ by Theorem 8.3. 


We have used the fact that a subsequence of a subsequence is a subsequence, 
for which the setup in terms of the notation of this section is simply 


NN Noe 


COMPACT SPACES 


In Chapter 3 we discovered the fundamental theorem that a continuous function 
on a closed bounded interval is uniformly continuous and has a maximum and 
minimum, and if it is one to one, then the inverse function is continuous. Now 
we shall discuss the general analog of this theorem. 


A metric space X is compact if every sequence in X has a convergent subsequence. 


The basic general theorem is as follows: 


Let f: X — Y be continuous, and let X be compact. Then 
(a) f(X) is compact. 
(b) f is uniformly continuous. 
(c) If f ts one to one, then the inverse function is continuous. 


It is assumed tacitly that X and Y are metric spaces. Then f(X), as a 
subset of Y, is also a metric space. 


Let {y,} be a sequence in f(X). By the definition of f(X), there is a point 
xy © X with f(xn) = yx. Since X is compact, there is a subsequence {xz,} 
such that x,,— x. Then, since f is continuous at x, we have 


Dk; = Ff (xx,) ie): 


7/metric spaces 


Proof of (b) If f is not uniformly continuous, then there is some positive € for which 
there is no 6. Therefore, for each positive integer k, there are points x; 
and y, in X with 


NCP) ee : and d(f(xe), f(yx)) Ze (1) 


(Otherwise 1/k would be a 6!) Use the compactness to choose {k;} so 
that x,,— x. By the first half of formula (1), it follows that yz,— x. 
Therefore, since f is continuous at x, it follows that 


f(x) > fe) and sf (ye,.) — f(x), 


which clearly contradicts the second half of formula (1). 


Proof of (c) If g is the inverse function, then what we have to prove is that if y, — y, 
with, of course, y, and y in f(X), then g(y,) > g(y). Let x, = g(y,) and 
x = g(y). 

If it is not true that x, — x, then, for some € > 0, there are infinitely 
many x, that satisfy d(x,, x) >. From these we can pick a subsequence 
that converges to some point z. Thus, we have a subsequence {x;,,} such 
that 


Xk, 2 and dG. o) 26 


From the second part of this formula it follows that z # x, and from the 
first part it follows that 


f(z) = lim f(@.,) = lim», = y = f(x), 
which contradicts the fact that f is one to one. 
Now the job is to identify the compact subsets of R”, and in general to 


describe compact spaces in some more geometrical way. It is helpful to divide 
the property of compactness into two parts. 


DEFINITION A metric space X is totally bounded if every sequence in X has a Cauchy 
9.3 subsequence. 
THEOREM A metric space is compact if and only if it is complete and totally bounded. 
9.4 
Proof Straight from the definitions it is obvious that if X is complete and totally 


bounded, then X is compact; and that if XY is compact, then it is totally 
bounded. What has to be shownis that if X is compact, then it is complete. 


LEMMA 
9.5 


Exercise 1 


THEOREM 
9.6 


Proof 


THEOREM 
9.7 


Proof 


Exercise 2 


Exercise 3 


DEFINITION 
9.8 


THEOREM 
9.9 


compact spaces 147 


If {x,} is a Cauchy sequence, then there is a subsequence {xz,} that con- 
verges to some point x © X. And we have the following lemma, valid in 
any metric space. 


Tf {xx} is Cauchy and xy,,— x, then xy x. 
Prove the lemma. 
A subset of R” is compact if and only if it ts closed and bounded. 


Theorem 8.5 shows that a bounded set is totally bounded, and Theorem 6.7 
shows that a closed subset is complete. This proves half of the theorem. 

Theorem 6.7 also shows that a complete subset is closed, so what 
remains is to show that a totally bounded subset of R" is bounded. If the 
set X C R* is unbounded, we can produce a sequence with no convergent 
subsequence as follows. Start with any point x1 © X. Choose x2 € X 
with |x2| > |xi]| + 1, then x3 GE X with |xs| > |x2| + 1, and in general 
Xm € X with |xm| > |xm—1| + 1. This sequence has the property that the 
distance between any two terms is >1, which rules out any chance of a 
convergent subsequence. 


In R* a closed ball, a sphere, and a closed rectangle are all compact. 


From their definition these sets are all bounded, and in Section 6 they are 
proved closed. 


If Y is a compact subset of R', then Y is bounded, and both the least upper 
bound and the greatest lower bound belong to YL. 


A continuous real-valued function on a compact metric space has a maximum 
and a minimum. 


The ‘“‘geometric”’ characterization of a totally bounded space comes next. 


The diameter of a set A in a metric space X ts the number 
8(A) = sup{d(x, y):x and y belong to A}. 
The metric space X is totally bounded if and only if it has the following 


property: For every € > 0, X is the union of a finite number of subsets of 
diameter <e. 


148 


7/metric spaces 


Proof 


Suppose that X is not the union of a finite number of sets of diameter <e. 
We shall produce a sequence with no convergent subsequence. Start with 
any point x). There must be a point xz with d(x, x1) > €/2, for otherwise 
X itself would have diameter <e. There must be a point x3 with 


d(x3, x2) > €/2 and d(x3, x1) > €/2) 


for otherwise X would be the union of the two balls B(x, ¢/2) and B(x, €/2), 
each of which has diameter <e. In general, there must bea point x,, with 


(<n Xs) > «/2 forj=1,... 57 — 1, 


for otherwise Y would be the union of the balls B(x;; €/2), each of which 
has diameter <e. The sequence {x} has no Cauchy subsequence, since 
the distance between any two terms is >¢/2. 

Now we shall show that a space with the property described in the 
theorem must be totally bounded. In the proof we shall use the obvious 
fact that if XY has this property, then so does any subset. 

Let {x,} be any sequence in ¥. Write X as the union of a finite 
number of sets of diameter <1. At least one of these contains infinitely 
many terms of the sequence {x,}. Choose any one that does and call it 
Xi, and choose k; so that x,, € Xi. Write X, as the union of a finite 
number of sets of diameter <#. At least one of these must contain 
infinitely many terms of the sequence {x,}, for X; contains infinitely many 
terms. Choose one of these and call it X¥2, and choose kz > &; so that 
xt, & Xa. In general, write X,,_1 as the union of a finite number of sets 
of diameter <1/m. At least one must contain infinitely many terms of 
the sequence {x,}. Choose sucha one and call it X,,, and choose km > km—1 
so that x,,, © Xm. The sequence {x;,,} is certainly Cauchy, for if n > m, 
then x,,, and x,, both belong to Y,, which has diameter <1/m; so d(xz,, 
ty 


Theorem 9.9 suggests a more geometric way to show that bounded sets in 
R* are totally bounded. It is enough to show that any rectangle 


Peed x, 7 fone — he 


where a; < ,, is totally bounded. Consider first R?. The idea is to cut R into 
four similar rectangles of half the size by the two lines through the center. Then 
cut each of these into four, and so on. It is clear that this process eventually 
cuts F into a finite number of rectangles of arbitrarily small diameter. Consider 
next R*. This time & is cut into eight smaller rectangles by the three planes 


Exercise 4 


DEFINITION 
9.10 


Exercise 5 


Exercise 6 


THEOREM 
9.11 


Proof 


. Exercise 7 


compact spaces 149 


through the center, and the argument goes as before. What we have to do is 

interpret this geometric construction analytically so that it is valid in general. 
First, the diameter of the rectangle is |b — al, for if x and y are any two 

points of R, then a; < x; < b; and a; < yi < b:, so |y; — xi] <b: — as; hence 


OD ( -wsv) ea Cay 


i=1 


The center of the rectangle (and this can be taken as the definition if you 
like) isthe pointe = (a2 + b)/2. Thesmaller rectangles are obtained as follows: 
Choose any u = (w1, . . . , Un) such that either u; = a;, or else u; = c;. (There 
are 2”such pointsw.) For each u, define a corresponding v as follows: Ifu; = ai, 
then v; = ¢; while if u; = ¢;, then v; = 6; Then set 


ae x Store ee 


Since there are 2” points u, there are 2” rectangles R,, and it is clear that every 
point of R belongs to one of them. It is also clear that v — u = (6 — a)/2, so 
the diameter of FR, is one half the diameter of R. 


If J is an infinite set, the sphere S(@; r) in ®() is not totally bounded. Hence 
neither is the ball. The case is the same in @(J). 


It often happens in analysis that functions to be minimized are not con- 
tinuous, but have the following weaker property. 


A real-valued function f on a metric space X 1s lower semicontinuous if 


ya) = lim inf f(x,) whenever Xn—> Xx. 


f:X— R! is lower semicontinuous if and only if f—'({x:x < a@}) is closed for 
every real a. 


If f:X — R! is lower semicontinuous, and X is compact, then f has a minimum. 


Let X be a compact metric space, and let S be a family of open subsets with 
union X. Then there is a finite subfamily with union X. 


For each x € X, let f(x) be the upper bound of the numbers r such that 
B(x; r) C G for some G in the family §. 


The function f is lower semicontinuous. 


150 7/metric spaces 


To complete the proof, use Theorem 9.9 to find a finite number of points 
x1, . . . ,*, such that X is the union of the balls B(x; 1/2), where r is the 
minimum of the function f which is guaranteed by Exercise 6. By the defi- 
nition of r, B(x;; r/2) is contained in some G € §, and the proof is complete. 


Exercise 8 The converse of Theorem 9.11 is also true. If X has the property that whenever 
it is covered by a family of open sets it must be covered by a finite subfamily, 
then X is compact. 


10 EQUIVALENCE OF ABSOLUTE VALUES ON R? 


Now we can prove easily that any two absolute values on R” are equivalent. 


THEOREM For any absolute value ||x|| on R” there are positive numbers m and M such that 
10.1 


m|x| < ||x|]| < M|x| Von Re 
Recall that ||x|| is an absolute value if it has the following three properties: 


a |eeznOwand ||x|— Ocnly 77 x — 0: 
2. |lax|| = fa] ||x|] ¢f @ is a real number. 


oe lees ll SE eel 


Proof Let e; be the point in R” with ‘th coordinate 1 and all the rest 0. Then if 
x = (x1, .. . , Xn), it follows that 


nr 
x= > Niet. 
t=1 


nr 


lll < Dbl Ula 


t=1 


Properties 3 and 2 then give 


and the Cauchy—Schwarz inequality gives 


l|x|| < Ad|x|, with M = ND) \les||2. (1) 


This proves half of the theorem and also shows that the function f(x) = |[x|| 
is continuous; for from formula (1) of Section 6 we have 


[ Mell — lvl] | < lle — vl] < AL |x — 9). (2) 


11 


products I5I 


Since f is continuous and the sphere § = S(0; 1) is compact, f has a 
minimum mon S. By property 1,m > 0. We have |ly|| > m for every 
y € S, that is, for every y with |y| = 1. If x 4 0, we can apply this to 
y = x/|x| to get 


then multiply both sides by |x| and use property 2 to get 
[1x]] 2 mx. 


So far this is proved for x # 0, but both sides are 0 for x = 0, so we are 


done. 
PRODUCTS 
If X, and X, are sets, then X, X X, is the set of ordered pairs (x, x2) with 
x1 © X, and x2 © X2. More generally, if %, ..., X, are sets, then X1 X 
7) Xs the set of n-tuples (xj, .. | , 4.) with, & 1 lt _X astasingle 
set, then X” is the set of n-tuples (x1, . . . , xn) with x; © X. In the particular 


case where X = R is the set of real numbers, this terminology agrees with the 
terminology we have been using, for R” was defined to be the set of all n-tuples 
of real numbers. 

If X; is a metric space with metric d;, there are various natural ways to 
define m=metne onX = 47 X ~-> X A,. Whe one suggested by R“1s 


d(x, y) = Vddi(x5, y;)*. (1) 


It is plain that d(x, x) = 0, d(x, y) > Oif x ¥ y, and d(y, x) = d(x, y); so what 
has to be established is the triangle inequality d(x, z) < d(x, y) + dy, z). To 
see this, let a, 6, and ¢ be the points of R* with coordinates a; = d;(x;, y,), 
b; = d;(y;, z;), and c; = dj(x;, z;). From the triangle inequality in X; we have 
0 <c; < a;+ 4;, and, therefore, |c| < Ja + 6| < |a| + ||, which is just what 
is required. 

This argument suggests a way to define all kinds of metrics on X. Choose 


any absolute value || || on R*. Ifx and y are two points of X, let a(x, y) be the 
point in R” with coordinates a; = d;(x;, yj) and define 
d(x, y) = lla, y)I- (2) 


The argument above shows that each of these is a metric on X, and the 
theorem of the last section (equivalence of absolute values on R*) shows that 
all of them are equivalent. The ones most commonly seen are the initial one 


152 


7/metric spaces 


Exercise 1 
Exercise 2 
Exercise 3 
Exercise 4 


Exercise 5 


Ih 


THEOREM 
12.1 


Example 1 


in formula (1) and 
d’ (x, y) a Zd;j(x;, vi), 
d(x, y) = max dj(xj, yj). 
Jj 


Whenever we speak of the product of metric spaces it is to be understood 
that the metric is the initial one of formula (1) unless otherwise stated, although 
in most questions this one can be replaced by any of the equivalent metrics of 
formula (2). 


The product of complete spaces is complete. 

The product of totally bounded spaces is totally bounded. 
The product of compact spaces is compact. 

The product of connected spaces is connected. 


The product of path-connected spaces is path connected. 


STONE-WEIERSTRASS APPROXIMATION THEOREM 


In this section we shall prove a far-reaching generalization of the Weierstrass 
approximation theorem (Section 8 of the last chapter). 


(Stone—Weiterstrass Approximation Theorem). Let X be a compact 
metric space, and let @ be any class of continuous real-valued functions on X 
with the following properties: 

(a) Each constant function is in @. 

(b) If f and g are in Q, then so are f + g and fg. 

(c) For any two distinct points x and y of X, there is at least one 
function f in @ with f(x) ¥ f(y). 

Then every continuous real-valued function on X can be approximated 
untformly by functions in GQ. 


The initial Weierstrass theorem is essentially the case where X is a closed 
bounded interval in R! and @is the set of polynomials. Two other fundamental 
examples are as follows: 


X is any compact set in R” and @ is the set of polynomials in n variables. In 
this case the theorem says that if f: X — R* is continuous, then there is a sequence 
{ fx} of polynomials such that f;, — f uniformly on X. 


Exercise 1 


Example 2 


Exercise 2 


Exercise 3 


Exercise 4 


Proof of the Theorem 


LEMMA 
12.2 


Proof 


Stone—Weierstrass approximation theorem 153 


In this example conditions (a) and (b) are obvious. Establish condition (c). 
(It is obvious, too!) 


Let @ be the set of “trigonometric polynomials” on the interval [—z, x], that is, 
the set of functions of the form 
n n 
Ga) = > a, cos kx + yy by, sin kx, —a7<x<n. 
k=0 k=1 
The number n is not fixed. The sums are arbitrary finite sums. The numbers 
a, and b, are arbitrary real numbers. 


Show that conditions (a) and (b) hold when @is the set of trigonometric poly- 
nomials and XY = [—z7, wr]. 


In this example we have to be a little careful because condition (c) in the 
theorem is not satisfied when X = [—72, 7a]. Every trigonometric polynomial 
takes the same value at —a asataw. Therefore, we must “identify”? —z and 7; 
that is, wrap the interval [—z, z] around the unit circle in the plane. In this 
case the theorem says that every continuous function f on [—z, x] such that 
f(—) = f(r) can be approximated uniformly by trigonometric polynomials. 


Give a rigorous proof of the contention in the last paragraph. 


What kind of functions on [0, 7] can be approximated by trigonometric poly- 
nomials that involve just the sines? Or just the cosines? (Hint: Apply the 
result of Example 2 and note that the sines are odd functions and the cosines are 
even functions.) 


The space C(X) of all continuous real-valued function on X is a metric 
space with the absolute value 


fll = sup{lf@)|:4 © X} 


and the distance d(f, g) = ||f — g||. Convergence in the space C(X) is 
just uniform convergence of functions. Therefore, the conclusion of the 
theorem is just that 

@= C(X). 


@ satisfies conditions (a), (b), and (c) in Theorem 12.1. 
Conditions (a) and (c) are self-evident because @ ) @. As for (b), sup- 


pose that f and g in @, and choose sequences {f,} and {g,} in @ so that 
tet anlclg, => 2. 


If 


7/metric spaces 


Exercise 5 Use the usual familiar proofs to show that f, + g.— f + g and fag, — fg and 
conclude that f + g and fg are in @. 


LEMMA If f € @, then |f| € @. 
12.3 
Proof Let € > 0 be given and let M be the maximum of |f|. Use the original 
Weierstrass approximation theorem to find a polynomial 
pit) =) ant 
k=1 


which approximates the continuous function |é| to within e on the interval 
[—-M, M]. Then for any point x € X we have 


If@)| — p(f@))| < 


IIIfl -— pefll <«. 


By virtue of conditions (a) and (b) (for @) and the fact that f € @, it 
follows that the function 


or, in other words, 


m 


pof= ) anf 
k=1 
isin @. Thus, |f]| € @ = @. 
LEMMA Tf fi, . . - 5 fmare in @, then the functions 
12.4 
NERS G AE Gea 1p and ALT fi) eee eset) 
are in @. 

Proof The maximum can be obtained by taking first the maximum of f; and fe, 
and then the maximum of this with f;, and so on, so it suffices to deal 
with two functions, in which case we have the formulas 

Vinnie ieee sham Seal 
max(f,g) =—S A= 8, inf, g) = $a 
The formulas are verified by simply noting that |f ~ g| = f — giff > g, 
while |f — g| = g —f if g > f (all at an arbitrary point x, of course). 
This lemma results from these formulas and the previous lemma. 
LEMMA [f'Fy and F, are disjoint closed subsets of X, then there is a function f € & 
12.5 


such that 


f=O0onFy f=1lonmh, 0<f(x) <1 for allx © X. 


Proof 


Stone—Weierstrass approximation theorem 155 


Let x and y be fixed points of Fy and F; and use condition (c) to find 
f € & with f(x) # f(y). Let a = f(x) and 8 = f(y), and consider the 
function 

p= eR 


a eae 


g 


Clearly, we have g(x) = 0 and g(y) = 1. Now let A = min(g, 1), and 
then k = max(h, 0). It is immediate that k(x) = 0, k(y) = 1, and 0 < 
k(z) <1 for all z © X; and from Lemma 12.4 [and condition (a)] it 
follows that k € @. (Thus, we have proved the lemma when Fp and F; 
consist of the single points x and y!) 

Let x be a fixed point of Fy. For eachy € Fi, use what has just been 
proved to find a function fy € @ such that 


i =0, fG)=1, 05,6) S51 for allz © X, 
and let 


Gy = {z:f,(z) > 2}. 


Now, G, is open by Theorem 6.4, and the family of the G, covers the set 
F, (since y € G,). Also F; is compact, for it is plain that any closed 
subset of a compact space is compact. Hence, it follows from Theorem 
9.11 that a finite number of the G, cover F;. Call them Gy,, . . . , Gy 
This means that at each point of F; at least one of the functions f,,, . 
fy, is >, and hence that the function 


he 


a 


g = max(fy, - - - ive) 


is >} at each point of F;. On the other hand, each of the functions f, is 0 
at x, so g(x) = 0. For the function h = min(2g, 1), we have 


kA@) =0, h=1lonF,, OS fh) <1 for all z € X. 


(Thus, we have proved the lemma when F» consists of the single point x 
and F; is arbitrary!) 

For each point x © Fo, use what has just been proved to find a function 
fz © @ such that 


jz) = 05 fz = lon, O= 7,2) <1 for all z € X, 
and set 
G, = aaa) < 4}. 


By the same reasoning as before, a finite number, G:,, . . . , Gz,, of the 
G, covers the set Fy. Consequently, the function : 


ison fee 6 5 fe.) 


156 7/metric spaces 


LEMMA 
12.6 


Proof 


is <% at each point of Fo, is equal to 1 at each point of Fi, and is between 
0 and 1 everywhere. Therefore, the function f = max(2g — 1, 0) does 
the job required by the lemma. 


One final lemma will almost complete the proof. 


Tf g © C(X), g = 0, then there is an f © @ with 
OSfSg and |g —fll S shall. 


Define the closed sets Fo and F; by 
Fo = {x:g(x) Salle} and Fi = {x:g(x) > 3llall}, 


and use Lemma 12.5 (multiplied by 4]|g||) to find a function f € @ such 
that 


f=Oon Fy f= 4llgllonM, 0 < f(x) < lle for all x C X. 


In the three cases, x © Fo, x © Fi, and x € neither, it is easily checked 
that 
0 < a(x) — f(x) S Gllell, 


and this proves the lemma. 

To complete the proof we must show that every g € ©(X) must lie 
in @. It is enough to show that every g > 0 in C(X) must lie in @, for 
if we know this then we shall know that both 


g*+ = max(g, 0) and g~ = min(g, 0) 


lie in @, and therefore that g = g+ + g- liesin @. [The fact that g+ and 
g belong to C(X) follows from Lemma 12.4 with @ replaced by C(X).] 

Suppose, therefore, that g © @(X) and that g>0. Let f; be the 
function f € @ given by Lemma 12.6, and put g: = g — fi. Now use 
the lemma again with g; in place of g, let f2 be the corresponding function 
in @, and let go = gi — fe. Once we have found fi, ..., fn and 
£1, . «+» > Zn, we use the lemma with g, in place of g, find the corresponding 
function fn41 1n @, and put ga41 = 2a — fai. With this construction we 
get that gnz1 > 0 and that 


Ilgntall S $llenll S G)?llgnall S )*llgn—a—oll S ()"* Mell. 


Furthermore, we get that 


gH=ath=aeethtfe = art > fe 
k=1 


Exercise 6 


THEOREM 
12.7 


Proof 


Stone—Wererstrass approximation theorem 157 


Since the sum on the right lies in @, the two formulas together show that 
gc @=&. 


The original Weierstrass theorem applied to an arbitrary interval, not just a 
compact one, and involved uniform approximation on all compact subsets. 
Invent a theorem that applies to any metric space that is the union of a sequence 
of open sets, each of which has compact closure. Show that any open set in R® 
is a metric space with this property. 

It was shown in Example 1, as an application of the Stone—Weierstrass 
theorem, that every continuous function on [—7, 7] which takes the same value 
at the end points can be approximated uniformly by trigonometric polynomials. 
Of course, a function that does not take the same value at the end points cannot 
be approximated uniformly (or even pointwise) by trigonometric polynomials, 
because all trigonometric polynomials do take the same value at the end points. 
In formulas (3) and (4) of Section 4 another metric was defined on the space 
C({—1, ]) of continuous functions on [—a, 7] by means of the inner product 
and absolute value 


(ie) = [7 f@e@ ax, Wf? = WGP. (1) 


Let f be continuous on[—x, 7]. For each e > O there 1s a trigonometric poly- 
nomial p such that 


j= Als 


For each positive integer k, let ¢, be a continuous function on the line that 
is equal to 1 for |x| <  — 1/k, equal to 0 for |x| > a — 1/2k, and between 
0 and 1 everywhere. On the one hand, we have 


cs 2 
If — eefl? = f" [PRG — on)? de <= IAP. 


On the other hand, ¢;f does take the same value (which is 0) at the two 
end points, so Example 1 shows that there is a trigonometric polynomial 
bx such that 
ee! 
leef — pel? S 2mllenf — pell? <7 
The two inequalities together prove the theorem as a result of the triangle 
inequality for the absolute value |_|. 


158 


8 2 Functions from R to R’ 


1 


DEFINITION 
11 


Exercise 1 


LINES, HALF-LINES, AND DIRECTIONS 


The line passing through the two distinct points a and b of R” is the set of all 
points x of the form 


x=(1-—da+ th =a+itb —a), t real. 


Let us check that this agrees with the usual notion of line in R*?. The 
slope of the line in R? passing through the points a and x is 


Xq — ao 


Xi — ay 
Therefore, x is on the line passing through a and 6 if and only if 


X2 — a2 a be — a2 (1) 
x1 — ay by — ay 


Suppose that 
x=(A1-—d)at+t=a+tb—a). 


Then x — a = t(b — a), so 
x1 — a1 = t(bi — ay) and xg — a2 = t(be — ae), 


from which it is plain that formula (1) holds. On the other hand, if formula 
(1) holds, then x — a = t(6 — a) with 
2) se 281 


— 
by — ay 


The above argument requires that b; ~ a,. Provide an argument for the case 


by = ilo 


Exercise 2 


DEFINITION 
1.2 


Exercise 3 


DEFINITION 
1.3 


DEFINITION 
1.4 


THEOREM 
1.5 


Proof 


lines, half-lines, and directions 159 
Discuss the situation in R?’. 
The half-line starting at the point a and passing through the point b # ais 


the set of all points x of the form 
x=(1-—d)at+t=a4+ tb — a), f270k 


Show that the definition agrees with the geometric notion in R’. 


A direction in R” 15 a half-line starting atO. The direction of a point b € R” 
1s the direction that contains it; that 1s, it 1s the half-line that starts at 0 and 
passes through b. (Of course, b # 0.) 


Each point 5 # 0 determines a direction, but of course many points 
determine the same direction. Exactly one of these, b/|b|, has absolute value 1. 
Therefore, a direction can be defined alternatively as follows. 


A direction in R” is a point of absolute value 1. 


The first definition is a little more intuitive, but the second is usually more 
convenient in practice. Both definitions are used, and the context determines 
which one is relevant. 

Let 4 be the half-line 


hex ja iG — aye] 0 


and let 
US @ 
——— 
|b — a 
Then clearly 
= \xix =a 10,1 = 0}. (2) 


The points a and 6 are called the initial point and the direction of the half-line h. 


A half-line determines its initial point and direction uniquely. 


Suppose that / is given by (2) and that also 


iki a ere ce Oh. 


Then 

a’ =at tA and a=a’ t+ 56’, 
so 590’ = —to8. Since both @ and @’ have absolute value 1, it follows that 
|so| = |¢ol; then since both 59 and fo are >0, it follows that so = to, and 


hence that 6’ = — 6—or else that 59 = to = 0; therefore, a’ = a. 


160 8/functions from R‘ to R” 


Exercise 4 


Exercise 5 


Exercise 6 


Exercise 7 


DEFINITION 
1.6 


Remark 


Show that if a’ = a, then it follows immediately that 6’ = 0. 


To finish the proof, we have to rule out the possibility that 6’ = —é@. To 
do this, consider the point 


a’ =a-+ #6, with ¢ > fo. 
This point is on fA, so we must have 
at t@=a'+ 50 = a+ t0 — 50 =at (t) — s)O. 


Therefore, ¢ = t) — s, which is impossible, since ¢ > t) and s > 0. 


Let h be the half-line with initial point a and direction 6. Let a’ be a point of A, 
and let A’ be the half-line with initial point a’ and direction —6@. Show that 


muh =inen= (1 — ha +m, 0 <7 a1) 


Thus, 2 ( 4’ is the line segment from a to a’, which, of course, is what it should be. 


Let a, 5, and ¢ be three points in R”. Define what it means to say that b is 
between a and ¢. Show that b is between a and ¢ if and only if 


d(a,c) = d(a, b) + d(b, c). 
A point a and a direction @ also deterniine a line—the set 
l= {x:x =a-+ #0, é real}. 


The line is the union of the two half-lines with initial point a and directions +8. 


A line does not determine a direction, but a pair of opposite directions. Ifa 
and b are any two points of the line, then the two opposite directions are 


+(b — a/|b — al). 


An oriented line is a line on which one of the two directions is designated as the 
“positive” direction. 


Geometrical statements often appear obvious simply because of the language. 
No one could seriously doubt that a half-line determines its initial point and 
direction. However, the language by itself does not constitute a proof; rather, 
the theorems that can be proved justify the language. 


derivatives and integrals 161 


2 DERIVATIVES AND INTEGRALS 


The general topic of the rest of the book is the study of functions from one space 
R*” to another R*. The easiest case to begin with is the case m = 1, which has 
many similarities, as well as some sharp differences, with the case where both m 
and n are 1, which has been discussed already. The study of functions from 
R! to R* is, of course, the study of paths, curves, or arcs in R”, but this is not 
always the fruitful point of view to take. 


DEFINITION A function f: I— R*, where I is an interval in R', 1s differentiable at an 
2.1 interior point a of I if the limit 
fin £2) — £00 
ma as —— 
za 


exists. The value of the limit is called the derivative of f at a and 1s denoted 


by f’(a). 


The definition is formally identical with the initial definition for the case 
n = 1. However, f(a), f(x), and [f(«~) — f(@]/x — a are now all points of R’, 
and sois the limit f’(a). Differentiability is brought back to the one-dimensional 
case by the following theorem. 


THEOREM Let f: 1— R’, and let 


2D) f(s) = (ts), . . . , fa(X))- 


Then f is differentiable at the point a © Tif and only if each fi, rs differentiable 
ata. If f is differentiable at a, then 


f(a) = A@), . - - sfa(2)). 


Proof This should be a routine, if somewhat burdensome, calculation by now. 
To make sure that the calculation is routine, we shall carry out the part 
which says that if f is differentiable at a, then each f;, is differentiable at a, 


and f’(a) = (fi(a), . . - , f,(2)). For any point y € R* we have |ys| < 
ly|. Consequently, if f’(a) = (bi, . . . , bn), then 
ful) — fel), E f(x) — f@ -6| 
x—a : x—a 


Given e > 0 we can find 6 > 0 such that if |x — al < 6 and x # a, then 
the right-hand side is <e. This shows that f; is differentiable and its 
derivative is 5,. 


162 


8/functions from R' to R® 


Exercise 1 
Exercise 2 


Exercise 3 


DEFINITION 
2.3 


THEOREM 
2.4 


Proof 


Exercise 4 


THEOREM 
2.5 


Carry out. the other half of the proof. 
If f is differentiable at a, then f is continuous at a. 


Use the example f(x) = (cos x, sin x), 0 < x < 27, to show that the mean-value 
theorem does not hold for functions with values in R*. 


An entirely similar situation prevails with regard to integrals. 


Let p = (xo, . . . , Xn) and (f1, .. . , &n) be a partition of the closed 
bounded interval I. If f: I— R"*, then 


Saf) = iene: 1) 
1 


The function f is integrable on I, and the integral is the point L © R" if for 
every positive number ¢ there is a positive number 6 such that if |p| < 4, then 


|S(p; f) — Ll <«. 


Let f:I— R*, where I = [a, b], and let f(x) = (filx), . . . , fa(x)). 
Then f is integrable on I if and only if each f, is integrable on I. If f is 
integrable on I, then the kth coordinate of {°f(x) dx is {° fu(x) dx. 


This time we shall prove the other half of the theorem. Suppose that 
each f;, is integrable on J, let 


[ro ao 


and let L = (Li, ..., Ln). For any given e > 0, Theorem 6.3 of 
Chapter 4 provides 6 > 0 such that if p is any partition with |p| < 6, then 
|S(p; fx) — La} < e, in which case 


IS; f) — Ll < Vne. 
Prove the other half of the theorem. 


In this context the upper and lower sums that were prevalent in Chapter 4 
are meaningless. It makes no sense to say that one point of R* is “larger” 
than another. 


Let f: 1 — R* be bounded and continuous at all but a finite number of points. 
Then 


Cane [Pro dt is a primitive of f. 
(b) If G is any primitive of f, then G(b) — G(a) = (27@ dt. 


Exercise 5 


Proof 
Exercise 6 


Exercise 7 


Exercise 8 


DEFINITION 
2.7 


.) 


tangent lines, velocity, and acceleration 163 


Prove the theorem on the basis of Theorems 2.2 and 2.4 and the results of 
Chapter 4. 


Let f: 1» R” be bounded and continuous at all but a finite number of points. 
Then 


Lf, 7) dx | < i, [f(x)| dx. 


The triangle inequality in R” shows that |S(; f)| < S(p; fl) for every 
partition p. 


Show that if f:!—> R* is Riemann integrable, then so is |f|, and deduce that 
Theorem 2.6 holds for any Riemann integrable f. 


Show that if f:7—> R! is Riemann integrable, then so is f?. 


Show that if f, g:[—> R* are Riemann integrable, then so is (f, g). (Hint: 
Lg) =| ae a = | Ge 


If f: X—+ Ris a function from any set X into R®, and f(x) = Gian bs 3 
fa(x)), then the function f,: X — R' is called the kth coordinate function of f. 


TANGENT LINES, VELOCITY, AND ACCELERATION 


A path in R* was defined to be a continuous function from a closed bounded 
interval into R*. This is not in perfect accord with geometric intuition—for a 
function contains too much information. Think of it this way. Suppose that 
an object is moving around in R* during a time interval J. For each? € Jf, let 
f(é) denote the location of the object at the time ¢. The function f describes 
the motion completely, so it must contain not only the information about where 
the point goes, but with what speed, with what acceleration, and soon. Intu- 
ition is not very explicit as to whether two paths are the same if they go to the 
same places, but differ in speed, acceleration, and soon. For instance, are the 
paths to school the same if on the one hand you go there directly, and on the 
other you sit down for a while at the sidewalk cafe on the way? Is the path to 
the dentist’s office the same as the path home? In both cases most people 
would say no—that the one is somewhat more pleasant than the other. 

It is not at all a simple matter to give a rule that defines when two functions 
represent the same path. In fact, it all depends on the problem.. In some 
problems two given functions should be considered different representations of 
the same path, while in others the same two functions should be considered 


164 


8/functions from R! to R” 


DEFINITION 
3.1 


Exercise 1 


Example 1 


THEOREM 
oe 


different paths. Therefore, we shall stick with the original definition that the 
path zs the function. It is not really harmful that the function contains too 
much information (as it would be if it contained too little). Because of the 
extra information, however, the function may be “bad” at some points where 
the intuitive path is not bad at.all. An example is given below. 


The tangent line to the path f:I —> R” at the point f(a) is the oriented line that 
passes through f(a) and has the direction of f'(a)—provided f is differentiable 
at a and f'(a) # 0. 


The fact that the tangent line is oriented reflects the fact that there is a “positive 
direction” on the path itself. (The path to the dentist’s office is not the same 
as the path home.) The unoriented tangent line is the set 


P= {xx = fa) tf G))\, 
and the orientation is the direction of f’(a), which is 


f@) 
zo) 


If f:[0, 1] R* is the path to the dentist’s office, what is the path home? 
Show that the tangent line to the path home at any point is the same as the 
tangent line to f, but with the opposite orientation. 


If f(t) = @, ¢) in R%, then f(0) = (0, 0) and f'(0) = (1, 0), so the tangent line 
to f at the origin is the set of points x of the form x = (0, 0) + ¢(1, 0) = (¢, 0), 
which is just the x; axis. From the intuitive point of view, the function Be) = 
(t5, t*) represents the same path, but g’(0) = (0, 0); so the tangent line to g at 
the origin is undefined by Definition 3.1. Again, from the intuitive point of 
view, the function A(t) = (Wt, t) represents the same path, but / is not even 
differentiable at 0. This is an illustration of how g and h may be “bad” at the 
origin even though the intuitive path is not bad at all. 


One situation in which it is very tempting to say that f:[— R” and 
g:J—> R" represent the same intuitive path is when there is a strictly increasing 
function g:J— J such that g = fog. (From the point of view of motion, 9 
simply gives a different way of measuring time.) It is to be hoped that f and 
g have the same tangent line, but Example 1 shows the need for caution, for it 
is of just this nature with g(t) = #3. 


(OM ee i gape, Hig SM mien 0. 
then f and g have the same tangent line at the point g(a) = f(¢(a)). 


Proof 


THEOREM 
3.3 


Exercise 2 


THEOREM 
3.4 


Proof 


Exercise 3 


Exercise 4 


Exercise 5 


tangent lines, velocity, and acceleration 165 


Since the two lines pass through the same point, all that is necessary is to 
show that they have the same direction, that is, that g’(a) is a positive 
multiple of f’(g(2)). This results from the following theorem on the 
derivative of a composite function. 


ee RB. If y is differentiable at a, and f is differentiable at (a), 
then f o » is differentiable at a, and 


(fo¢)'(a) = ¢'(a)f’(v(a)). 


Prove Theorem 3.3 by considering the coordinate functions of f separately 
(Theorem 2.2), and then by using the chain rule for composite functions in 
Section 6 of Chapter 2. 


From the geometrical point of view the justification of Definition 3.1 is the 
following. 


Let f: 1 R* be differentiable at a, with f’(a) #0. The direction of the 
tangent line at f(a) is the limit of the directions of the chords from f(a) to 
f(b) as b— a from the right. 


The chord from f(a) to f(b) is the line joining these two points with the 
direction 


fo) —f@) _f@%)-f@_b-a 
fF) —F@| ba If) — f@)| 


As b — a from the right, the first factor has the limit f’(2) and the second 
factor has the limit 1/|f’(a)|. 


(1) 


What happens when ) — a from the left? 


If f’(a) # 0, then f(b) # f(a) for all 6 sufficiently close to a, so formula (1) 
makes sense for all 6 close to a. 


In Definition 3.1 we speak of the tangent line to the path f at the point 7@: 
Give several examples of various kinds to show that we should really speak of 
the tangent line at a, not the tangent line at f(a)—even though this is not so 
pleasing to the intuition. 

Notice that the phrase “limit of directions” needs no definition. . A direc- 
tion is a point of absolute value 1, so nothing more is involved than a limit of 
points. 


166 


8/functions from R} to R” 


DEFINITION 
3.5 


Let an object move in R" during a time interval I. For each t € I, let f(t) 
be the location of the object at time t. The velocity at time t = a ts f(a), the 
speed is |f’(a)|, and the acceleration is f’'(a). 


The definition is formally the same as the one in Chapter 1 for motion on a line. 
Furthermore, the velocity at time ¢ = a is the limit of the average velocity over 
the time interval from ¢ = a to t = x as the time interval goes to 0. Indeed, 
this average velocity is (by definition) the difference between the final and 
initial positions divided by the time interval, which is nothing but 


fe) — f@. 


a = @ 


Notice that the direction of the velocity is the direction of the tangent line. 


GEOMETRIC MODELS OF R* 


Although R® is not by its definition a geometric object (it is a set of n-tuples), 
it is convenient to have a geometric model in mind to suggest terminology, 
results, and sometimes proofs. So far the model has been the everyday three- 
dimensional space, which is a model of R’, and by analogy of R*. The everyday 
space becomes a model of R?® by assigning to the triple (x1, x2, x3) C R® the 
point with these coordinates relative to given coordinate axes. 

A second model is also useful. It is obtained by assigning to the triple 
(x1, 2, x3) not the point with these coordinates, but rather the line segment from 
the origin to that point. 

We have seen in Section 3 that the velocity of a moving object at a given 
time is a pointin R*. The information that is expected from a velocity consists 
of a direction (the direction of the motion) and a positive number (the magnitude 
of the velocity, or speed). Now, a point x # 0 in R” does carry just this informa- 
tion. It determines the direction x/|x| and the positive number |x|, and, in 
turn, it is determined by these two quantities. However, from the intuitive 
geometric point of view, a point does not carry the flavor of a magnitude and a 
direction. It is rather a line segment starting at the origin that carries this 
flavor. 

The geometric language for R” derives about equally from the two models, 
so both must be kept in mind, and the good model to use in a given situation 
must be determined by the context. This is a bother at first, but becomes 
fairly easy with practice. Of course, it is not necessary to keep any model in 
mind. Everything is on a perfectly sound, if somewhat mysterious, footing 


DEFINITION 
4.1 


DEFINITION 
4.2 


THEOREM 
4.3 


geometric models of R” 167 


right in R* itself. Often a point in R” is called a vector. This is a rather 
neutral term that does not suggest either model, as does the term point, although 
perhaps it does have a slight bias toward the line-segment model. 

Back in Section 1 of Chapter 7 (Exercise 2), the formula 


(x, 9) = [x] |y] cos 8 (1) 


was established for dimension 2, where 6 is the angle between the half-lines 0x 
and Oy. With the line-segment interpretation, we can say simply that @ is the 
angle between x and y. The formula can serve as the definition of the angle 
between two vectors in R”. 


The angle between the nonzero vectors x and y of R” is the angle 6 defined by 


cos 8 = * 9) Sue (2) 


|| |y| 7 


The definition makes sense because on the one hand the cosine is one to one on 
the interval 0 < @ < x and takes every value between —1 and 1; and on the 
other the number (x, y)/|x| |y| does lie between —1 and 1 by virtue of the 
Cauchy—Schwarz inequality. 

If one of the two vectors is 0, the angle is undefined. This is as it should be, 
because then the line segment has length 0, and the angle between some line 
segment and a segment of length 0 does not make geometrical sense. Notice 
that the angle from x to y cannot be distinguished from the angle from y to x. 


The vectors x and y of R” are perpendicular, or orthogonal, if (x, y) = 0. 


If both x and y are 0, then the definition is consistent with Definition 4.1. 
The two are perpendicular if the angle between them is 90°. However, the 
vector 0 is perpendicular to every vector. There is no reason why this should 
seem geometrically “right.’’ It is a technical convenience which is made part 
of the definition. Notice that 0 is the only vector that is perpendicular to every 
vector—indeed, it is the only one that is perpendicular to itself, for, if x # 0, 
then (x, x) = |x|? ¥ 0. 

As an illustration of these concepts we shall consider a couple of problems 
about motion. To do so, we need Theorem 4.3. 


Let f:I— R* and g:I—> R". Then 
iste) = aoe reo 


The meaning of the theorem is this. It is assumed that both f and g are 
differentiable at some point a, and that A(t) = (f(é), g(#)), so that A:J— R’. 


168 8/functions from R} to R* 


The assertion is that # is differentiable at a and that 


A'(a) = (f'@), g(a)) + (f@), ge @). 
Proof If f(t) = (AQ, - . - fal) and g(t) = (ar(t), . . . , gn(@)), then 
A) = )) faldge(® 
k=1 


so 


h'(t) 


y Kee) + Y fei 
k=1 k=1 
(7D, 6) + (FO, &'@). 


Example 1 If a point moves on a sphere, then the velocity at any point is perpendicular to 
the radius drawn to that point. 


Proof If f(¢) lies on the sphere S(a; r) for each ¢, then 
r? = |f@) — aj? = (f@ — 2, f® — a). 


Differentiation by the formula of the last theorem gives 


ve 2(f'(é), f@ i a), 


and f’(#) is the velocity, while f(¢) — a has the direction of the radius. 

Another way of saying the same thing is that if a path lies on a surface 
(in this case the sphere), then the tangent line to the path is tangent to the 
surface. But tangents to surfaces have not yet been discussed. 


Example 2 If a point moves at constant speed, then the velocity and acceleration vectors 
are perpendicular. 


Proof If the speed is v, then we have 


? = [fO? = FOF), 
and differentiation gives 


0 = 2(f'"(), f’@). 


Example 3 Let a point move at the constant speed v around the circle $(0; 1) in the plane. 
By Example 1 we have 0 = (f’, f), and differentiating again we get 0 = 
fA +P) = U.P +. Now, if f is perpendicular to f and f” is 
perpendicular to f’, it follows that f” is a multiple of f, say f’’ = af. (This is 


Exercise 1 


Exercise 2 


. 


missiles, moons, and so on 169 


where the fact that the dimension is 2 comes in.) We get 


= = Ea) =e (af, f) = a, 
so finally 


f= vf 3) 


This is the formula for centrifugal force. It says that when a point moves 
around a circle of radius 1 at constant speed v, there is an acceleration that is 
directed toward the center of the circle and has magnitude v?. 


Explain why the acceleration is directed toward the center of the circle. 


A point moves in the plane at speed 1 along the curvey = x. Find the accelera- 
tion at the point (x, y) of the curve. 


MISSILES, MOONS, AND SO ON 


Let us shoot off a missile and try to find out what becomes of it. 

The basic law governing motion in the three-dimensional space is again the 
second law of Newton (see Section 4 of Chapter 1)—but in vector form: The 
acceleration of a moving object is proportional to the force acting on it. \n other words, 
there is a constant (depending on the object, and called its mass) such that if x(¢) 
is the position of the object at time ¢, and f(¢) is the force acting on it at time /, 
then 


mx" (t) = f(@) for all ¢. (1) 


Notice that for the formula to make sense the force must be a-vector, which is 
consistent with the usual concept of a force as a quantity with a magnitude and 
a direction. 

The second physical law that has to be used is the law of gravity, which 
will determine the force in our present problem. It says that two objects of 
masses m and M at a distance r apart attract one another with a force of 
magnitude 


pat (2) 


provided the units for measuring force and acceleration are chosen suitably. 
[It is remarkable that the mass that appears in formula (2) is the same as the 
mass that appears in formula (1). The verification of this fact was the object 
of the famous experiment of Galileo’s in which he dropped stones from the 
leaning tower of Pisa—or so the story goes.] 


170 


8/functions from R} to R" 


Exercise 1 


In applying these two laws to discover where the missile goes, we shall make 
the assumption that the rocket engines have been shut off and that the missile 
and the earth are coasting along under the influence of their mutual gravita- 
tional attraction alone. This is absurd, of course. The main force acting on 
the earth is not the attraction of the missile but the attraction of the sun. 
However, the calculations are interesting, and the results are in fact realistic 
in the similar problem of the orbit of the earth (or other planets or meteors) 
around the sun. 

Letm and M be the masses of the missile and the earth. Choose coordinates 
in the three-dimensional space, and let y(¢) and z(t) be the positions of the missile 
and the earth at time ¢. The distance between the two at time ¢ is then |y(¢) — 
z(t)|, and the direction from the missile to the earth is z(t) — y(t)/|z(#) — y(2)]. 
According to formula (2), the force on the missile at time ¢ is 


2) = 90) 
lO —yOP 


and the force on the earth is the opposite (i.e., negative) of this. Thus, formula 
(1) gives the two equations 


fi) = mM 


my = mi ae Mz! = mM 2—- (3) 
lz — yl? jz ~ 9? 
which hold at each point ¢. 

It simplifies matters a little to calculate the path of the missile relative to 
the earth rather than to calculate the two paths separately. This means to 
calculate the function x = y — z rather than to calculate y and z SepEREN I). 
The equations (3) give the following equation for x: 


x! = —p ie where p = m+ M. (4) 


Before going on we should understand exactly what is at issue here. We 
are assuming that the function x has two continuous derivatives and satisfies 
equation (4) at each point of some open interval J, and we are trying to discover 
what we can about x under these conditions. In particular, equation (4) 
implies that x(¢) # 0 for each ¢ J. [From the physical point of view this 
condition is obvious, for if x(¢)) = 0, then the missile has crashed at time fo, and 
there is nothing more to be said.] 


Consideration of x rather than y and z amounts to putting the origin of the 
coordinate system at the center of the earth so that the coordinate system moves 
right along with the earth. Why can’t this be done at the beginning? 


Exercise 2 


missiles, moons, and so on TE 


The first step is to show that the point x(¢) remains in a fixed plane through 
the origin for allt. To see this, write equation (4) for the th and jth coordinates. 
” ” Aa 


ae 6 


Multiply the first by x; and the second by x; and subtract, to get 


4} 


vt / i i 
O = x, x, — x, x, = Gix; — xd), 


and hence that xjx; — xX isconstant. Ifhisthe constant vector with coordinates 


/ i A i i fi 
hy = XgX3 — X3X2, he = X3XyX1%, hg = X4X_ — Xo%1, (5) 
then 
(x, h) = xihi + xohe + xahz = 0. 


(Everything cancels out.) This means that for every ¢, x(¢) lies in the plane 
that passes through the origin and is perpendicular to the fixed vector h, pro- 
vided of course that h # 0. 


If h = 0, then x(é) lies on a line through the origin. [Hint: On any interval 
where x2 # 0, the fact that hs = 0 gives (xi|x2)’ = 0; hence x1 = ¢x2. Similarly, 
the fact that A: = 0 gives x3 = dx2. Now use connectedness and the fact that 
x(t) ¥ 0 for each t € J to show that x(¢) lies on the line with equations x1 = ¢x2 
and x3 = dxe for all ¢ € I.] 


Now we shall choose the coordinates in R? so that x(¢) lies in the plane 
x3 = Oforeacht © J. Then we can simply forget about the three-dimensional 
space and treat the problem in R? instead. Equation (4) remains the same. 
The complex multiplication on the plane (especially multiplication by 7) is 
what will let us form the right expressions. Remember that the inner product 
is expressed in terms of the complex product by the formula 

(z, w) = Re zi, 
so in particular we have 


(iz, z) = 0 for every z € R?. 


Take the inner product with 7x on both sides of (4). The result is that 
(ix, x') = 0, and hence that (ix, x’)’ = (ix, x”) + Gx’, x’) = 0. Therefore. 
(ix, x’) is a constant, which we shall call k: 


Gx, x') = k (= constant). (6) 
Now let r = |x| and 6 = x/|x|, so 


x = 70 and x = 764+ 76’ 


172 


8/functions from R* io R* 


Exercise 3 


[This is all right because x(t) # 0 for every t€ J.] According to Example 1, 
Section 4, 6’ and @ are perpendicular, so 6’ is a (real) multiple of i, say 6’ = ai, 
in which case x’ = r'@ + arid; then k = (ix, x’) = r’(ix, 0) + ar(ix, 10) = ar’. 
Thus, 


k 
pes 
= =z 26, 
This formula and the basic equation (4) show that kix’” = — 6’, and integration 
gives 
kix’ = —un(0 +e), (7) 
where ¢ is a constant vector. Now, k = (ix, x’) = —(x, ix’). Therefore, 


k? = —(x, kix’) = p(x, 6+ e). 


If we let « = |e| and let ¢ be the angle between 6 and the fixed vector e, we get 
the final equation 


2 


r(1 + ecos¢y) = ; (8) 


for the equation of the curve on which the missile moves relative to the earth 
(in polar coordinates). The constant » is the sum of the masses, and the 
constants k and ¢ are determined by any initial position x9 = x(t) and velocity 
Uy = x’ (to) by 


k = (ixo, vo), €=—-W—-—) e—aer (9) 
mn 


If the coordinates in the plane of the motion are chosen so that ¢ lies on the 
x, axis, then in rectangular coordinates [(x, y) instead of (x1, x2) to avoid sub- 
scripts] the equation becomes 


Lys k 
(1 Se) x + ——— (10) 
BL p? 


It is shown in analytic geometry that this is an ellipse if e < 1, a hyperbola if 
é¢ > 1, and a parabola ife = 1. (To study the curve, simply complete the 
square in the x’s.) Thus, an orbit must be one of these three curves—which 
one depends on the initial position and velocity. 


Discuss the curves with equation (10). What is the story if k = 0? 


Exercise 4 


Exercise 5 


Exercise 6 


missiles, moons, and so on 173 
The center of mass of the earth and the missile is defined to be the point 


my + Mz 
m+M 


Show that the center of mass moves with constant velocity (i.e., constant speed 
along a straight line) and find the formula for w. 


Use the foregoing results to describe the paths y — w and z — w (i.e., the motion 
of the missile and the earth relative to the center of mass). With Exercises 4 
and 5 you will have the individual paths y and z. 


If x is any path in the plane and r(é) and ¢(¢) are polar coordinates of x(#), then 
Gi 1 oe (11) 


Therefore, equation (6) and the formula for area in polar coordinates show that 
the area swept out by the radius drawn from the earth to the missile during the 
time interval [f1, te] is 


k 
area = 5 (t2 — t1). (12) 


The ideas in this section had a profound effect on the early development of 
physics—but they emerged in the reverse order. By detailed analysis of actual 
observations, Kepler deduced that the motion of a planet about the sun is 
governed by the following celebrated laws: 


1. The orbit of the planet is planar and is an ellipse with focus at the sun. 

2. The radius from the sun to the planet sweeps out equal areas in equal times. 

3. The square of the period (period = length of time for one orbit) is proportional 
to the cube of the mean distance from the sun. 


From these laws Newton was able to deduce the law of gravitation (2). 

As for law 1, we have shown that the orbit is planar and that in fact it is an 
ellipse. (Hyperbola and parabola are ruled out because they require that the 
planet disappear in the distance.) We have not discussed the focus, but if you 
know what the focus of an ellipse is, you can show easily that the focus is the sun. 

As for law 2, this is just formula (12). And as for law 3, we shall omit it, 
but the interested student may be able to work it out for himself. 

Kepler’s laws were the monumental result of several years of extraordinary 
labor, and he was highly pleased with them. 


174 


8/functions from R} to R” 


DEFINITION 
6.1 


Johann Kepler 


“The die is cast, the book is written, to be read either now or by posterity, 
I care not which; it may well wait a century for a reader, as God has waited six 
thousand years for an observer.” 


ARC LENGTH 


The ideas needed to discuss arc length in R® are already present in the two- 
dimensional case, which we have discussed briefly in Section 7 of Chapter 4, 
and in the theory of the Riemann integral. 


Let f:[a, b] > R* bea pathin RR”. If p = (to, . . . , tm) ts a partition 
of [a, b], let 


Xp) = ) Ife) — f(-a)l. 
t=1 


The path has length L if lim, p49 ((p) = L in the sense that for each positive 
number € there is a positive number 6 such that if |p| < 6, then |l(p) — 
1b eG, 


The points f(t;) are the vertices of an inscribed polygon, and the number /(f) is 
the sum of the distances between successive vertices, that is, the length of the 
polygon. Thus, the length of the path is the limit of the lengths of approxi- 
miating inscribed polygons. 


Exercise 1 


LEMMA 
6.2 


Proof 


THEOREM 
6.3 


Proof 


arc length 175 


Define the length of a path in any metric space. 


If r is a refinement of p, then I(r) > I(p). 


Ifr is obtained from p by adding just one point ¢ which is between ¢;_; and 
t;, then the term | f(t) — f(t:-1)| which occurs in /(p) is replaced by the 
sum | f(t) — f()| + [f() — f(ta)|. The triangle inequality gives 


oy CS OO |) — Gan 
and this shows that /(r) > /(p). 


The length of the pathf exists if and only if the numbers l(p) are bounded, and, 
if this is the case, then 


L = sup{l(p):p is a partition}. 


First we shall show that if the length Z exists, then for every partition g we 
must have /(qg) < L. Let € be a given positive number and choose 6 in 
accordance with the definition. Fix a partition p with |p| < 6. If, is 
the common refinement of p and gq, then |r| < 6; so by the choicc of 6 and 
by Lemma 6.2 we have 


lq) Sig) SL+e 


Since this holds for every ¢ > 0, it follows that /(qg) < L. 

The proof of the other half is very much like the proof of Theorem 6.1 
of Chapter 4. Suppose that the upper bound S of the /(p) is finite and 
let € > 0 be given. Fix a partition g such that 


L(q) 2S €, 


and let NV be the number of points in g and 6 be the smallest interval in q. 
Choose 6 < 8 so that if |f — s| < 6, then |f(t) — f(s)| < «. We shall 
show that if |p| < 6, then 


1G) 28 = Ge ae (1) 


which will prove the theorem because we have automatically that /(p) < S. 

If r is the common refinement of p and gq, then on the one hand we 
have l(r) > l(qg) > S — «, and on the other hand we can compare ((r) 
and /(p). Suppose that ¢ is a point of ry that is not in . Then the two 
adjacent points of 7, call them ?’ and #’’, must belong to # by virtue of the 
fact that |p| < § < 59. If we replace the term |f(t’) — f(t’)|; which 
occurs in p, by the sum | f(t’) — f()| + |f@) — f@)|, which occurs in r, 
we increase the sum by at most 2e. If we do this for each point of r that 


8/functions from R} to R" 


THEOREM 
6.4 


Exercise 2 


THEOREM 
6.5 


Exercise 3 


Remark 


Exercise 4 


DEFINITION 
6.6 


Exercise 5 


is not in p, we increase the sum by at most 2Ne. Thus, we have 
S—e<lr) < lp) + 2Ne. 


This proves inequality (1) and hence the theorem. 


Let f:{a, b] > R* be a path, and let c be a point between a and b. The 
arc length on a, b] exists if and only tf the arc lengths on [a, c] and[c, b] both 
exist—in which case it is of course the sum. 


This theorem is obvious from the preceding one. Prove it. 


Let f:(a, 6] > R* be a path. If f’ ts uniformly continuous on (a, b), then 
the arc length exists and ts given by 


b 
Le [i \p'@l de. 
The proof is just like the proof of Theorem 7.4 of Chapter 4. Carry it out. 


By combining Theorems 6.4 and 6.5, we can claim that the arc length of the 
path f exists and is given by the integral any time the interval [a, 5] can be 
partitioned into a finite number of subintervals in such a way that f’ is uniformly 
continuous on each open subinterval. Such paths are usually called piecewise 
smooth. Every polygon is, of course, piecewise smooth. 


It is sometimes advantageous to choose the parametric equations for a 
given geometric path in such a way that the arc length along the path is the 
parameter. We shall have to use this idea later, so let us see what is 
involved. 

Let f:[a, b] > R* be a path of finite length L. Fora <¢t < 4, let L(t) be 
the length of the path over the interval [a, ¢]. 


Show that L(t) is continuous and nondecreasing on a < t < b, and that if f is 
not constant on any subintcrval, then L(t) is strictly increasing. In this latter 
case, the inverse function A = L7~! is continuous and strictly increasing on the 
interval QU; si 


Let f:[a, b] > R” be a path of finite length L that ts not constant on any 
subinterval. The path g = fo A 1s called the parametric representation of f 
by arc length. 


Let g be the parametric representation of f by arclength. Ass goes from 0 to L, 
the point g(s) traces out the same geometric path as does f(t) when ¢ goes from a 
to b. For each s, the arc length of g on [0, s] is equal to s. 


arc length GE 


We shall need the results of Exercises 4 and 5 only when the initial path f 
satisfies the conditions: 


1. f’ ts uniformly continuous on (a, 6). 


2. f/(t) ¥ 0 on (a, 6). 


In this case the results are quite obvious from the integral formula for arc 
length (Theorem 6.5). 


Exercise 6 Establish the results of Exercises 4 and 5 in this special case by using the integral 
formula for arc length. Show that |g’(s)| = 1 for each s. 


Exercise 7 Parametric representation by arc length is more of theoretical importance than 
of practical importance. To get an idea of what is involved, try to find the 
parametric representation by arc length in the case of the circle x? + y? = 17’, 
the parabola y = x?, and the ellipse x?/a? + y?/b? = 1. 


1 


DEFINITION 
Veal 


Exercise 1 


Exercise 2 


Algebra and Geometry 
in R’ 


SUBSPACES 


The subject of Chapter 8 was paths in R* and functions from R! to R*. The 
next step is surfaces in R” and functions from R™ to R*. But first it is necessary 
to take a close look at the simplest case in which the surfaces are planes and the 
functions are linear. 


A subspace of R” is a subset V with the property that if x and y are any two 
points of V, and a ts any real number, then 


sap Se and ax € V. 


The simplest subspace is the set V = {0} (consisting of the vector 0 alone). 
The next is the set of all multiples of some nonzero vector v—in other words, 
the line through v and the origin. 


Show that the line through v and 0 is a subspace and that it is the smallest 
subspace containing »v. 


Next consider two vectors v and w that are not on a line through 0, and let 


V = {av + Bw:a and £ are real}. (1) 


Show that V is a subspace and that it is the smallest subspace containing both 
v and w. 


Exercise 3 


DEFINITION 
1.2 


THEOREM 
1.3 


Proof 


Exercise 4 


DEFINITION 
1.4 


subspaces 179 


The subspace V in formula (1) is called the two-dimensional plane passing 
through v, w, and 0. 


Discuss this last statement geometrically in R*. 


If S is any subset of R”, then (S] ts the set of all vectors of the form 

x = a0; tage t+-:: SP AmUm, (2) 
where vy, . . . , Um are vectors in S, and ay, . . . , @m are real. [S|] zs 
called the span of S, or the subspace generated by S. 


For any set S, [|S] 1s a subspace, and tt ts the smallest subspace containing S. 


If x and y belong to [S], then 
x = ayy + * + + And and y = Biwi t + + + + Baer, 
where each »; and each w; belong to S. Then 


xy = owt +++ bmn + Bit ++ + + Baws 


and 


ax aayvy + °- ++ + Admvn. 


This shows that x + y and ax belong to [S$], and hence that [SS] is a sub- 
space. If V is any subspace containing S, then certainly the definition 
requires that V contain every vector x of the form (2), and hence that V 
contain [S]. 


A given subspace V ~ {0} is always spanned by many different sets of 
vectors. For example, a line through the origin is spanned by any nonzero 
vector on it; a two-dimensional plane through the origin is spanned by any two 
vectors in it that do not lie on the same line through the origin. 


Suppose that the two vectors do lie on a line through the origin. What do 
they span? 


The dimension of a subspace V {0} ts the smallest number of vectors that 
spanit. In other words, the dimension of V is m if there exists some set of m 
vectors that spans V, but no set of m — 1 vectors spans V. The dimension 


of {0} zs 0. 


Another way to state the definition is that the dimension of V is <m if there 
exists a set of m vectors that spans V. 


180 


g/algebra and geometry in R® 


Example 


2 


DEFINITION 
2.1 


Example 1 


Exercise 1 


DEFINITION 
2.2 


Let V = R* and let 


ee (1,00 2 0): Gop 1, OO ¢= (00.1 4. 2 OE 
and in general let ¢; be the vector with jth coordinate 1 and all others 0. It is 
immediately seen that if x = (m, .. . , x»), then 
x= > X3e;- (3) 
j=l 
This shows that the vectors ¢1, . . . , én span R*, and hence that the dimension 


is <n. It is true that the dimension is equal ton, but this is not so easy to show. 
It follows from the results of Section 2, which provide a simple condition that 
m vectors span a space of dimension m and not one of some smaller dimension. 


BASES 
Vectors v1, ..., Um are linearly dependent if there exist real numbers 
1, . . . 5 Gm not all O such that 
m 
> A= 0). (1) 
j=1 


The vectors are linearly independent if they are not linearly dependent. 


Formula (3) in the last section shows that the vectors ¢1,.. . » €n described 
there are linearly independent. 


A single vector is linearly dependent if and only if it is 0. Two vectors are 
linearly dependent if and only if they lie on a line through 0. 


The example and the exercise suggest that m vectors are linearly independent 
if and only if they span a subspace of dimension m. To prove it we shall make 
use of the following definition and lemma. 


A vector x 1s a linear combination of the vectors Dy 5 ey Omit) there exist 
real numbers a1, . . . , Qm Such that 


m 
x= > Qj0j. 
j=l 


In terms of this definition, the span of a set of vectors consists of all linear 
combinations of these vectors. 


bases 181 


LEMMA Tf v1, . . . , Umare linearly dependent, then one of them is a linear combination 
2.3 of preceding ones. 
Proof Let equation (1) hold, and let a be the last nonzero aj. Then 
k-1 
ay 
Uy, = = — Uj. 
5 ak 
j=1 
THEOREM If v1, . . . , ¥, are linearly independent and lie in the spanof wi, . . . ,Wm, 
as then kk < m. 
Proof Let V be the span of wi, . . . , Wm, and suppose, to get a contradiction’ 
that m<k. Let S° = {wi, ..., wm}. We shall prove by induction 
that for eachj = 0,1, . . . , m there is a set S? composed of v1, . . . , 2; 
and of m — j of the w’s which still spans V. This will indeed be a con- 
tradiction, for it will imply that S” = {v1, . . . , um} spans V, so that 


which is impossible if the v’s are linearly independent. 


Suppose that we have already Si = {u1, . . . , 0j, Ujsa, » » » 5 Um}, 
where wj,1, . . . , Um are certain of the w’s. Since S$’ spans V, we have 
j m 
a= > anid: + I Otitis, 
i=1 t=j+1 
which implies that the vectors v1, . . . , Uj+1, Ujt1, » - - » Um are linearly 


dependent. According to the lemma, one of them is a linear combination 
of preceding ones. It cannot be a v that is a linear combination of pre- 
ceding ones, since the v’s are linearly independent. Therefore, some u, 
say uw, is a linear combination of 1, . . . , 2j41 and of the other w’s; then 
SH1 = STU {v5.1} — {ui} still spans V. 


Theorem 2.4 makes it easy to show that the dimension of R™ isn. It has 
been seen that the vectors ¢:, . . . , én are linearly independent and that they 
span R", The fact that they span means (by the definition) that the dimension 
is <n, and the fact that they are linearly independent means (by Theorem 2.4) 
that no fewer number can span. Similar arguments can be used to discuss 
subspaces in general. 


DEFINITION A basis of a subspace V is a set of linearly independent vectors that span V. 
2.5 


182 


g/algebra and geometry in R” 


THEOREM 
2.6 


Proof 


COROLLARY 
2.7 


Proof 


THEOREM 
2.8 


Remark 


Exercise 2 


Exercise 3 


THEOREM 
2.9 


Proof 


Every set of linearly independent vectors in V is part of a basis of V. 


Let m1, . .. , x in V be linearly independent. If they do not already 
span V, then some x € V is not a linear combination of , . . . , vg, in 
which case Lemma 2.3 shows that 01, . . . , vg, x are linearly independent. 
Write x = v.41 and start again. 0), . . . , ve41 are linearly independent. 
If they do not span V, then there exists 442 € Vsothatv1, . . . , vey2 are 
linearly independent. In due course the process must stop, because 
Theorem 2.4 shows that more than n vectors in R” cannot be linearly 
independent. 


Every subspace V ~ {0} has a basis. 


Use the theorem, starting with any nonzero vector in V. 


In order to avoid treating the case V = {0} as an exceptional one in every 
theorem, it is convenient to consider the empty set as a basis of {0}. This 
requires occasional special proofs, which will always be left to the reader. 


Let the dimension of V be m. Then every basis of V contains exactly m 
vectors, and every set of m linearly independent vectors in V is a basis of V. 


Up to the point of Corollary 2.7 it was not clear that the definition of the 
dimension really made sense. That is, it was not clear that every subspace of 
R” is spanned by a finite number of vectors. The corollary does make this 
clear. 


Prove Theorem 2.8—on the basis, of course, of Theorem 2.4. 


Show that if VC W and V # W, then dim V < dim W, where dim V is the 
dimension of V. 


The vectors v1, . . . 5 Umin V are a basis of V if and only if every vector 
x © V can be written in one and only one way in the form 


= » OjUj, (2) 
j=1 


where 0, . . . 5 Q@m are real numbers. The number a; is called the jth 
‘coordinate of x relative to the basis v1, . . . , Um 


The fact that each x € V can be written in the form (2) is part of the 
definition—the part that says that v1, . . . , ¥,» span V. The uniqueness 


Exercise 4 


DEFINITION 
2.10 


DEFINITION 
Zale 


bases 183 


X= OU; + gv, 


Figure 1 
is proved as follows. If 
m 
os = > B03, 
j=1 
then 
m 
WS > (aj; — Bj)o;; 
gil 
then a; — 6; must be 0 because v1, . . . , Ym are linearly independent. 


Geometrically, we can think of a basis of Vas determining a set of coordinate 
axes in V. The jth coordinate axis is the line through ; and the origin. In 
two dimensions it looks as shown in Figure 1. To get the first coordinate of x 
relative to the basis v1, v2, draw the line through x parallel to v2. The point 
where it meets the line through 2; is a multiple of v1, say a1v1. The number a; 
is the first coordinate of x. 


In a similar geometric way, describe how to get the coordinates of a vector 
x € R’ relative to a basis v1, v2, v3. 
A plane in R” is a set that is “parallel” to a subspace. To describe this 
properly we shall need a couple of definitions. 


If X is a subset of R" and a ts a point, then X + a is the set of all points 
x +a with x C X. It ts called the translate of X bya. X —a= 
X + (—a) ts the set of all points x — awithx © X. 


A subset TI of R” is a plane of dimension m if for some point a, Il — atsa 
subspace of dimension m. 


184 g9/algebra and geometry in R” 


Exercise 5 


DEFINITION 
2.12 


Exercise 6 


Example 2 


Figure 2 


If II is a plane, then II — a is a subspace if and onlyifa EII. Ifaandbare 
any two points of a plane IT, thenII — a = TI — 8. 


Tf Lis a plane and a © MI, the subspace TI — a is called the subspace parallel 
to IT. 


By Exercise 5, the subspace parallel to a plane IT does not depend on the point a 
used to define it (Figure 2). 


A line (in the sense of Chapter 8) is a plane of dimension 1, and conversely. 


One typical way to determine a plane of dimension m is to use m + 1 
Pointsonit. A line is determined by two points, an ordinary plane is determined 
by three points, and so on. 


Find the plane determined by the points ao, . . . , am. 


Let IT be the plane and let V be the parallel subspace. Then V is spanned 
by the points a, — do, . . . , @m — do, SO 


W-—-a=V= {xx = y t;(a; — as) 


j=1 
Hence 


= 
I 


{x:x =ay+ > t;(a; — ay), (3) 


j= 


tie {xx = ‘ ta; with » i = 1}. (4) 


or 


Exercise 7 


DEFINITION 
2.13 


THEOREM 
2.14 


Exercise 8 


Example 3 


bases 185 


To pass back and forth between (3) and (4) simply note that the coefficient of 
a in (3) is 


a 
o 
l 
i 
| 
(D498 
Pima 


Formula (4) is a little the better of the two because it does not give preference 
to ao over the other points, but formula (3) is sometimes easier to use. 


Find the plane in R* determined by the three points (1, 0, 0), (0, 1, 0), and 
(0, 0, 1). 


Of course, m + 1 points determine a plane of dimension <m unless the 
points are independent in a suitable sense. Three colinear points, for instance, 
determine a line, not a two-dimensional plane. 


The points ao, ..., @m are affinely dependent if there exist numbers 
Fo eeeice all Oisucnvilat 

m m 

L tja; = 0 and t; = 0. 

j=0 j=0 


The following are equivalent: 


(a) ao, .. . 5 Gm are affinely independent. 

(b) a1 — do, . - » » @m — ao are linearly independent. 

(c) ao, . . « , Gm determine a plane of dimension m. 

(d) The points (ao, 1), .. - 5 (@m,1) in R™*! are linearly inde- 


pendent, where (a, 1) is the point whose first n coordinates are the coordinates 
of a and whose last coordinate is 1. 


Prove the theorem. 


A second typical way to determine a plane of dimension m in R” is to use 
one point on the plane and n — m directions that are perpendicular to the plane. 


Find the plane through a given point a and perpendicular to given directions 
61, of 8 Gk —@ — 7, 


This means, of course, that the parallel subspace is perpendicular to the 
given directions. Thus, if II is the plane and x is a point of H, then x — a is 
perpendicular to each 6;. Therefore, the equations of I are simply 


(x — a, 0;) = 0, | = ees (5) 


186 g/algebra and geometry in R” 


Exercise 9 


Exercise 10 


3 


DEFINITION 
3.1 


Exercise 1 
Remark 


THEOREM 
3.2 


Proof 


COROLLARY 
3.3 


Proof 


Find the plane in R* that passes through the point (1, —2, 3) and is perpendicular 
to the direction of (—1, —8, 6). 


Use Theorem 3.7 below to show that if 4, . . . , 0 are linearly independent, 
then the equations (5) define’a plane of dimension n — &. 


ORTHONORMAL BASES 


In general the coordinate axes determined by an arbitrary basis are not perpen- 
dicular to one another. In problems that involve only addition and scalar 
multiplication, this is usually immaterial. One basis is as good as another. 
But in problems involving the distance or the inner product, the calculations 
are usually much simpler if the coordinate axes are perpendicular. 


A set of vectors is orthonormal if each vector has length 1 and any two are 
perpendicular to one another. In other words, ¢1, . . . , €m1s orthonormal if 


S| ifi =f, 
ay if ij. 


Show that the usual basis of R*, that is, ¢: = (1,0, ... , 0), e2 = (0,1, 0, 
a Oe is orthonormal, 


A vector of length 1 is sometimes said to be “‘normalized.”’ The word ortho- 
normal is a combination of orthogonal and normalized. 


Let 1, . . . , €m be orthonormal, and let 
x= > je}. (1) 
j=1 
Then 
an = (x, ex). (2) 


Take the inner product on both sides of (1) with ¢,. Then (x, e.) appears 
on the left and a, appears on the right (since (@;, ¢,) = 0 for all j # &, 
while (¢, ¢.) = 1). 


An orthonormal set is linearly independent. 


If x = 0 in formula (1), then formula (2) shows that each a, is 0. 


Example 


Exercise 2 


THEOREM 
3.4 


Exercise 3 


Proof 


orthonormal bases 187 


Rotate the coordinate axes in R? through an angle 6. Find the formula relating 
the new coordinates of a point x to the old ones. 
If ¢, ¢2 is the original orthonormal basis and €}, é, is the new one, then 
el eees — (0,1), ¢, — (cos Osim 8), e, = (—sin 6, cos 6). 
If (x1, x2) are the original coordinates of x and (x), x9) are the new ones, then 
X = X11 + xo€2 and x = xe, + xbeo- 
According to Theorem 3.2, we have 


xi = (x, 4) = x4(e1, 61) + xo(eo, 1) = *1 Cos O + xy sin 8, 
xt, = (x, 4) = x1(e1, €2) + x2(e0, €) = —%1 sin 8 + xz cos 8. 


(3) 


é 


The calculations are the same for rotations of the three-dimensional space 
and, in general, for passing from one orthonormal basis of R” to another. If 


n n 

(dle 

x= » ee; and x= » X50; 
j=l = 


n 


ty a on ey) — > x; €39 ex), 


ga 


then 


so what has to be calculated is each of the inner products (¢,, ¢). 
The number (;, ¢,) is the cosine of the angle between the x; and xy, axes. 


Let V be a subspace with orthonormal basis e, . 
x © R* can be written uniquely as 


. 5» €m Lach vector 


eS xe x (4) 


where x’ isin V and x" is perpendicular to V. In fact, 


——— s (x, €;)ej. (3) 


ell 


Prove the uniqueness part of the theorem. Show that x’ is closer to x than any 
other point of V, and give a geometric statement of the theorem. 


If we define x’ by formula (5), then certainly x’ € V, and what we have 


bi 
to show is that x” = x — x’ is perpendicular to V (i.e., is perpendicular to 
perp , 1S perp 


every vector in V), Theorem 3.2 shows that (x’, ex) = (x, ex) for each k, 


188 g/algebra and geometry in R” 


THEOREM 
3.5 


Proof 


Exercise 4 


Remark 


DEFINITION 
3.6 


Exercise 5 


THEOREM 
3.7 


and hence that (x’’, ¢,) = Oforeachk. Ify is any vector in V, then y isa 
linear combination of the ¢;’s, say 


3 


and then 


9) = DY aunlx”, en) = 0. 


j=l 


Every subspace has an orthonormal basis. Indeed, any orthonormal subset 
in V is part of an orthonormal basis of V. 


Let ¢1, . . . , ex be orthonormal and lie in V, and let W be the span of 
é1,..., € It is enough to show that if W = V, then there exists 
éx41 © V such that e1, . . . , ex41 is still orthonormal. 


Prove that this is enough. 


If W # DV, take any vector x that isin V but not in W, and apply Theorem 
3.4 to write x = x’ + x”, where x’ isin W and x” is perpendicular to W. 
Now, x’’ isin V, because both x and x’ are in V; and x’ # 0, because x is 
not in W. Hence, we can take exy1 = x’’/|x’’|. 


What we have shown is that every orthonormal subset of V is part of an ortho- 
normal basis. To finish the first part of the theorem, choose x ~ 0 in V and 
start with the orthonormal set ¢, = x/|x|. As usual, the case V = {0} is excep- 
tional. To include it in the theorem we must do what? 


If A is any subset of R, its orthogonal complement, written A+, is the set of 
all vectors y © R” that are perpendicular to every vector in A. 


For any subsets A and B we have 
(a) If A C B, then At D B?. 
(b) A* is a subspace (whether 4 is or not). 
(c) [4]> = A*. 


Notice that part (c) was already used in the proof of Theorem 3.4 in the case 
where A = {e,,. . . ,ém}, and V =[A]. 


For any subspace V of R", dim V + dim V+ = n. 


Proof 


THEOREM 
3.8 


Proof 


Exercise 6 


Exercise 7 


THEOREM 
3.9 


orthonormal bases 189 


The abbreviation “‘dim” stands for the dimension. Use Theorem 3.5 to 
find an orthonormal basis ¢1, . . . , ém of V, and then use the theorem 
again to find ¢nj1, ... , é, 80 that ¢1, . .. , é 1s an orthonormal basis 
of R”. What we shall show is that émyi, . . . , én is then an orthonormal 
basis of V+, which will imply that dim V = n — m, as required. 

First of aH, émsi, . - - , én do lie in V* by virtue of part (c) of Exercise 
5. On the other hand, for any x € R” we have 


nm 
a ) (x, €;)e;. 
j=l 


Meciiessin) 7. them the first m termsare 0) Wherefore: ¢,m, 2 een 
span V+, and since they are linearly independent (being orthonormal), 
they do form a basis. 


For any subspace V, (V*)* = V. 


It is plain from the definition that V C (V~)~. On the other hand, the 
dimensions of the two are the same. So by Exercise 3 of Section 2 the 
two subspaces must be the same. 


For any subset A, (A+)+ = [A]. 


Theorem 3.8 gives another way to look at subspaces. If V has dimension 


m, let émzi, . - + y €n be a basis of V* (orthonormal if you like, but it does not 
make any difference). Let 
V3 = {e}+ = f7'(0), — where fi(x) = (, 43). (6) 


The theorem says that 
VS VO ON Ve (7) 


There are two interesting conscquences. 
If ¢ is a nonzero vector in R”, then {e}+ has dimension n — 1. 


Combining the exercise with formula (7), we get the first interesting 
consequence. 


Every subspace of dimension m in R” 1s the intersection of n — m subspaces 
of dimension n — 1. 


190 g/algebra and geometry in R* 


THEOREM 
3.10 


Proof 


Exercise 8 


In the three-dimensional space, for instance, every line is the intersection of two 
p ? P) y 

planes. Of course, the two planes can be chosen in infinitely many different 

ways. 


The second interesting consequence is the following one. 


Every subspace of R” is a closed set. 


Because of formula (7) it is sufficient to prove that each V; is a closed set. 
(Why?) For this, it is sufficient to prove that the function f; defined in 
formula (6) is continuous. (Why?) Now, if 


f(x) = (, *); 
then by Cauchy—Schwarz we have 


If) — fO)| = Ke x — 9) S lel Ie — al, 


which certainly does show that f is continuous. (Why?) 


When we come to define the tangent space to a surface at a point, it will be 
natural sometimes to define it as the span of a certain set of vectors (as in the 
initial discussion of subspaces), and other times to define it as the orthogonal 
complement of a certain set of vectors (as in Theorems 3.8 and 3.9). 

We shall finish up this section by describing an orthonormal basis in an 
infinite-dimensional space. The space is the space @C([—z, 7]) of continuous 
real-valued functions on the interval [—7, 7] with the inner product 


(f,8) = [7 fede) ax, (8) 
and the corresponding absolute value and distance 


f= VGp, «<2 hoe (9) 


Since all questions of convergence refer to this absolute value rather than the 


usual one (|| /|| = sup{|f(x)|:« © [—7, 7]}) on a space of continuous functions, 
we shall call the resulting metric space #, rather than C({—z, z]). 


Define ¢9 = 1/V 20 and 
1 n 1 . n+1 


é, = — = cos —* for meven, en = 


x for n odd, 
Va 2 T 2 
a a ere 


Show that the e, are orthonormal in &. 


THEOREM 
3.11 


Exercise 9 


Proof of the Theorem 


THEOREM 
3.12 


orthonormal bases 1gI 


What we want to do is to show that the e, form an orthonormal basis of 5 
in a suitable sense. 


For each f © KH we have 


f= . On€ns Where Gn —"(Fen), (10) 
n=0 


and the series converges in the space K. 


The series (10) is called the Fourier series of the function f. ‘The coefficients 
are called the Fourier coefficients. Note that they are just the coefficients that are 
given by Theorem 3.2. It is not claimed that the series converges uniformly, 
or even pointwise, but rather that the partial sums converge in the metric 
space JC. 


Let V,, be the space spanned by é0, . . . , ém, and let 5 be the mth partial sum 
of the series (10). Use Theorem 3.4 and Exercise 3 to show that f — Sm is 
orthogonal to V,, and that s, is the closest point of V,, to f. 


With the aid of Exercise 8 and Theorem 12.7 of Chapter 7, the proof of 
the theorem is immediate. The latter says that every fG H can be 
approximated in 3 by trigonometric polynomials, and this means pre- 
cisely that 


i We) =U. 
On the other hand, the fact that s,, is the closest point of V,, to f means that 
AG a) = | == Gale 


Consequently, 5, — f, and so the series converges to f. 


The inner product and absolute value are expressed in terms of coordinates 
relative to this orthonormal basis in just the usual way. 


If f and g are in R with Fourier coefficients {an} and {bn}, then 


oo oo 


‘ke = ) ay ES a) 


n=0 n=0 


and the series converge absolutely. 


192 


g/algebra and geometry in R™ 


Exercise 10 


If sm and ¢, are the partial sums, then 


m m 


Ge Te = > andr; IKealig = D ae. 


n=0 n=0 


Remark Things like this, and also Exercise 9, can be proved either by repeating the 


Exercise 11 


Exercise 12 


Exercise 13 


4 


DEFINITION 
4.1 


proofs we have already had or else by noticing that V,, is a finite-dimensional 
space; so it is the same as R”*! and there is after all nothing really to prove. 


If sm — f and tm— g, then (sm, tm) — (f, g). Deduce from this and Exercise 10 
that formula (11) holds. 


What remains of the proof of the theorem is to show that the series (11) con- 
verge absolutely. For the second one there is no problem, since each term is 
nonnegative. Show that the first converges absolutely by using the Cauchy- 
Schwarz inequality to prove that 


» lanbal < [fllel- 


Questions about pointwise convergence of Fourier series are extremely 
difficult. For a great many years, in fact, one of the celebrated unsolved 
problems of analysis was to decide whether the Fourier series of a continuous 
function must converge at at least one point. A couple of years ago the Swedish 
mathematician, L. Carleson, succeeded in showing that it must, and, in fact, 
that it must converge at “almost all” points, the ‘‘almost all’? being a technical 
term that is explained in Chapter 13. 


Calculate the Fourier series for the functions x, |x|, and e?. Discuss the point- 
wise convergence as far as you can. 


LINEAR TRANSFORMATIONS 


The simplest functions from R™ to R” are the linear functions (or linear trans- 
formations, or linear operators—the three terms mean the same thing). 


A linear transformation from R™ to R” is a function T:R™ — R* such that 
T(x +y) = T(x) +TQ), Tlax) = aT(a), (1) 
for all x and y in R®™ and all real a. 


THEOREM 
4.2 


Proof 


Exercise 1 


DEFINITION 
4-3 


linear transformations 193 


Cee, vera vans of IR" and tel U4, . . + , Um be arbitrary 
vectorsin R". There is one and only one linear transformation T:R™ — R” 
such that 

ie \— 9; one ler se, 2 see om 


Formula (1) implies that 


m m 


io > X50}, then 2) — » x;T (e), (2) 


j=1 j=) 


from which it is plain how to define 7: 


== X50}, then.) = X4U;. (3) 
2 2 


The definition makes sense because every vector x in R™ can be written 
in one and only one way in the form indicated (Theorem 2.9), and (2) 
shows that (3) is the only possible definition. What remains is to show 
that T is linear, that is, that condition (1) holds. 


Show that the function 7 defined by (3) is linear. 


Once bases are fixed in both R” and R” the linear transformations can be 


represented in a concrete way. Letei, ... , émand fi, ... , fn be bases in 
R” and R’, and let 7 be a linear transformation from R™ to R*. If aj; is the 7th 
coordinate of T(¢;) relative to the basis fi, . . . , fn, then (by the definition of 


coordinates relative to a basis) 


1G) > ie Seep St ee (4) 
i=l 
On the other hand, if the numbers aj; are given, then formula (4) determines a 


unique linear transformation JT. Indeed, call the right-hand side v; and apply 
Theorem 4.2. 


The double sequence {cj} is called the matrix of the linear transformation T 
erative ouie bases ey... . @ 5 €m @Nd fis. « = fu 


What has been shown is that there is a one-to-one correspondence between 
linear transformations and matrices once the bases are fixed. 


194 g/algebra and geometry in R” 


Ordinarily, the numbers in a matrix are displayed in a rectangular array 
with a,; in the ith row, jth column. 


O41 @12 @13 °° * lim 
21 Goo 93 °° © Im 
Qn1l Q@n2 Qnz3 °° © Anm 


The way to remember it is that the coordinates of T(¢;) go down the jth column. 
The relation between a linear transformation and its matrix can be expressed 
in another way. 


THEOREM Let {a,j} be the matrix of a linear transformation T relative to bases e1, 
4.4 eC ana iene ay. Leb 


m n 
oS > X75 Ej and ee D yifi- 
Dal : 


Then y = T(x) if and only if 


—— ijX7 for 1= 1, ooo & & tbs (5) 
A 
Proof Formulas (2) and (4) give 


ll 


TQ) = ) xT) 


j=l jg=li=l 


x50ti3 fi. 


The coordinate that multiplies f; is the number y; in (5). 


Note that formula (4) expresses the relation between a linear transformation 
and its matrix by showing how the basis vectors transform, while formula (5) 
expresses the relation by showing how coordinates transform. The equations 
(5) are called a system of n linear equations in m unknowns. ‘The theorem shows 
that the study of linear transformations is equivalent to the study of systems of 
linear equations. Usually, a given problem can be studied from either point of 
view. Sometimes the one is convenient, sometimes the other. 

There is still another way to express the relation between a linear trans- 
formation and its matrix that works when the bases are orthonormal. Take 
the inner product with f, on both sides of formula (4). If fi, ... , fn is 


THEOREM 
4.5 


Exercise 2 


Exercise 3 


Exercise 4 


Exercise 5 


linear transformations 195 


orthonormal, then all terms but one drop out: 


n 


(Te), fe) = 2 aus fis fe) = Ong. 


i=l 


When T is a linear transformation, it is customary to write Tx instead of T(x) 
as long as no confusion is likely. In this notation the result above is as follows: 


Let { cuz} be the matrix of T relative to bases e1, . . . , @mandfi, .. . yfn 
Tf fis - » « 5 fn ts orthonormal, then 
aig = (T6;, fi). (6) 


If a © R”, then the function 
Tx = (x, a) (7) 


is linear from R™ to R'. Conversely, every linear transformation from R™ to 
R! has this form and a is unique. 


Let V be a subspace of R®. For each x € R®*, let Px be the closest point in V 
to x. (See Theorem 3.4 and Exercise 3 which follows it.) Show that P is a 
linear transformation. Show how to choose a basis of R” so that the matrix of 
P looks as shown in Figure 3, where everything is 0 in the areas marked by big 
zeros. How many 1’s are there? (Note that since P:R" — R®, there is only 
one basis to choose, which plays the role of both the e’s and the /’s.) P is 
called the projection on the subspace V. 


Define the graph of a function {:R™— R*. (It is a subset of R”**.) 


Show that a function {/:R” — R” is linear if and only if its graph is a subspace 
ot Rie, 


Figure 3 


196 g/algebra and geometry in R” 


. 


THEOREM 
5.1 


Exercise 1 


Exercise 2 


Exercise 3 
Exercise 4 


Exercise 5 


SUMS AND PRODUCTS 


If f and g are functions from a set X into R”, there is a natural way to define the 
sum f + g and scalar product af: 


(f + g)(@) = f(x) + gt), — (af) («) = af). (1) 


If S and T are linear transformations from R™ to R", then S + T and aS 
are linear. If S and T have matrices {si;} and {ti} relative to given bases, 
then S + T and oS have the matrices {ui} and {vx;} defined by 


Wig = Sig bey Dag = Sj. (2) 
Prove the theorem. 


Formula (2) is used to define the sum of two matrices without any reference 
to linear transformations. If M and N are the matrices {s,;;} and {t;}, then 
M +N is the matrix {u,;} defined in (2), and aM is the matrix {v,;}. This 
definition makes no reference to linear transformations, and it is rather natural; 
but the reason for making it is Theorem 5.1, which says that the matrix of a 
sum § + TJ is then the sum of the matrices. 

The composite of two linear transformations § and T (which makes sense 


8 
it Re nk = R”) is usually called the product and is written 7S rather than 
LOT 


If § and T are linear, then 7S is linear (when the spaces are right so that it 
makes sense). 


The reason for calling the composite a product is simply that it acts like a 
product in many respects. 


RIS + T) = RS+ RT, (S+ T)R = SR-+ TR, and (RS)T = R(ST), when- 
ever the formulas make sense. For example, the first formula makes sense when 
S, T:R'— R* and R:R™— R”. Discuss when the others make sense. 


The linear transformation 7:R” — R” defined by Ix = x for each x is called the 
identity. It acts like the number 1 in the sense that SJ = Sand JT = T if the 
products are defined. 


Suppose that 7:R”— R’ is one to one and onto so that as a function it has an 
inverse T-!:R*— R™. Show that if 7 is linear, then 7—! is linear and acts 
like the reciprocal of T in the sense that T7~' = Jand T~'T =J. Are the 


Exercise 6 


THEOREM 
5.2 


Exercise 7 


sums and products 197 


two /’s the same? (This is an unfair question. In the next section we shall 
show that m must equal n, soin fact they are. However, all these considerations 
apply equally well to linear transformations from one subspace V to another 
one W, in which case the J’s do not have to be the same.) 


In some respects the product of linear transformations is quite unlike the 
product of numbers. 


Give an example of S, T:R*-> R? such that ST = 0, while TS 4 0. This 
shows that on the one hand the product of two nonzero transformations can be 
zero, and on the other that the product is not the same when the order is 
reversed. 


Let us calculate the matrix of a product. Choose bases di, . . . , d of 
Ro ci, . een ol R”, and fi, . .. , f, of R*, and let S and 7 have matrices 
{sx;} and {t,;} relative to these bases. Then 


Sd; — >, suse and Tex = » Lids 


k a 


TSd; = >, suite = » {> tases} ie 
k : k 


sO 


t, t 


which gives ‘Vheorem 5.2: 


If S and T have matrices {si;} and {ti} relative to given bases in the three 
spaces, then TS has the matrix {uij} defined by 


uj = » likSkj- (3) 
k 


As in the case of sums, formula (3) is used to define the product of two 
matrices. If M = {si} and N = {t;}, then NM = {u;;}, where uj; is given by 
(3). The reason for the definition is to make the matrix of a product equal to 
the product of the matrices. 


What are the conditions on two matrices M and N in order that the product NM 
make sense? 


In assigning a matrix to a linear transformation T:R”™— R”, we have to 
choose two bases, one in R™ and the other in R*. When m = a, that is, when 
T is a linear transformation from R’ to R”, we often (though not always) choose 
just one basis and let it play the role ofboth. Thus, when we say that 44 = {m,;} 


198 g/algebra and geometry in R” 


Exercise 8 


Exercise 9 


Exercise 10 


6 


DEFINITION 
6.1 


Exercise 1 


THEOREM 
6.2 


THEOREM 
6.3 


Proof 


is the matrix of T relative to the basis e1, . . . , én, we mean that 


n 
Te; = y Mjilj- 


The matrix of the identity /:R*— Rt (relative to any one basis) is the matrix 
with 1’s along the diagonal and 0’s everywhere else. It also is called J. For 
any matrices M and N, we have MI = M and IN = N if the products make 
sense. 


A square matrix M is invertible if there isa matrix M~! such that M@M~! = 
and MM = J. 


A square matrix M is invertible if and only if the corresponding linear trans- 
formation (relative to any basis) is invertible. Why does M have to be square? 


Two square matrices M and N are the matrices of the same linear transformation 
T:R"— R* (M relative to one basis, N relative to some other basis) if and only 
if there is an invertible matrix U such that M = UNU-}, 


NULL SPACE AND RANGE 


The null space of a linear transformation T:R™ — R" ts the set 
Ne =f ©) = {x CR Tx = 01. 
The range is the set 
Rr = T(R”) = {y C R*:y = Tx for some x C R*}. 


Prove the following theorem. 


The null space of a linear T:R™ — R* ts a subspace of R™; the range is a 
subspace of R". T is one to one if and only if Nr = {0}. 


Here is a basic dimension formula. 


Let T be a linear transformation from R™ toR”. Then 


dim Nr + dim Rr = m. (1) 


Choose a basis é1, . . . , e of Nr, and then choose ex41, . . . , &m so that 
é1, . . + , mis a basis of R” (Theorem 2.6). Then the dimension of Nr is 


Exercise 2 


COROLLARY 
6.4 


Proof 


THEOREM 
6.5 


Proof 


null space and range 199 


k, so what must be proved is that the dimension of Rr ism — k. For this 
it is sufficient to prove that Tes41, . . . , Tem is a basis of Rr. 
If y € Rr, then y = Tx for some 


and then 


y= Tx = : x; Ve; 
panel 
(The first & terms drop out because the first k e’s are in the null space.) 
This shows that everyy © Rrisalinearcombination of Tex41, . . . , Tem— 
hence that they span Ar. 
It remains to show that they are linearly independent. If 


m 
aj;Te; = 0, 
jg=k+l1 
then 27,1 ae; is in the null space; therefore, 
k 


. Oye; = » Otje;- (2) 


j=Hk+l1 j=l 


But this is impossible unless all a; are 0, because the e’s are linearly 
independent. 


What can you say about the matrix of T relative to the basis used above for 
R” and any basis for R”? 


If T is one to one, then m <n. If T ts onto, then n << m. Hence, if T 
is both one to one and onto, then m = n. 


If T is one to one, then dim Nz = 0,som =dim Rr <n. If T is onto, 
then n = dim Rr < m. 


Let T be a linear transformation from R® to R* (same dimension). Then T 
is one to one tf and only if it is onto. 


Now the m and n are equal, so if T is one to one, then formula (1) shows 
that dim Rr = m =n. But a proper subspace of R* cannot have the 
same dimension (Exercise 3, Section 2). If T is onto, then formula (1) 
shows that dim Nr = 0, and then Theorem 6.2 shows that T is one to one. 


200 g/algebra and geometry in R* 


With each linear transformation 7:R”— R* is associated a linear trans- 
formation 7*:R"— R” called the adjoint, which provides important information 
about T itself. 


DEFINITION The adjoint of a linear T:R™ — R” is the linear transformation T*:R"— R™ 
oe defined by the equation 


(Px, p= Ce forallx GR™ and y CR” (3) 


It is not at all obvious that the definition makes sense, so some remarks are 
calledfor. Lety bea fixed vector in R” and consider the function L(x) = (Tx, y). 
It is plain that this is a linear function from R™ to R!, so by Exercise 3 of Section 5 
there is a unique point a € R” such that L(x) = (x, a). We define T*y to be 
this point a2. ‘Then equation (3) is satisfied, and what remains is to show that 
the function 7* is linear. We have 


Cy Oe) 9 2) Tx ya Te, 2) x, ya az) 
= (x, T*y + T*z). 


It follows that T*(y + z) = T*y + T*z, for two distinct vectors cannot have 
the same inner product with every vectorx. (Their difference would be perpen- 
dicular to itself!) The fact that T*(ax) = aT*x is proved similarly. 


Exercise 3. Do it. 


In terms of matrices, the matrix of T7* is obtained from the matrix of T by 
interchanging rows and columns, provided the bases are orthonormal. 


THEOREM Let T be a linear transformation from R™ to R”. Let T and T* have 
6.7 matrices {oxj} and {oxy} relative to orthonormal bases e1, .. . , mand 
Ties ee oien 
* 
Ay = Ai. (4) 
Proof According to Theorem 4.5, we have 


ay = (Tf, 67) = Ce, T*f) = (Tez, fx) = cuz 


Exercise 4 The theorem provides another way to define the adjoint. If T has matrix 
{a;j} relative to orthonormal bases e1, . . . , e¢mand fi, . . . , fn, let T* be the 
linear transformation from R" to R™ with matrix {aj;} relative to the same bases, 


null space and range 201 


where aj; = a;. Show that if the definition is made this way, then 7™ satisfies 
equation (3). 


Exercise 5 CEA Ie 
THEOREM Nos = (Rr)* and Rrs = (Nz)*. 
6.8 
Proof Suppose that y © Nps. Then 


OR ye aby) for every x € R”, 
which says exactly that y is perpendicular to the range of T. On the 
other hand, if y is perpendicular to the range of T, then 

iy) — G, Ty) torevery 1] R™ 
This says that 7*y is perpendicular to every x © R”, and hence that 
T*y = 0. 

This establishes the first formula in the theorem. If we take orthogo- 

nal complements and use Theorem 3.8, we get 


Rr = (Nrx)+. (5) 


Replacement of T by T* and the use of Exercise 5 give the second formula 
in the theorem. Note that taking orthogonal complements in the second 
formula in the theorem gives 


Nr = (Rrx)t. (6) 
THEOREM dim Rr = dim Rpx. 
6.9 
Proof According to Theorems 6.8 and 3.7, we have 
dim Nr + dim Rrx = m. 
Comparing this with Theorem 6.3, we get the theorem. 
THEOREM If T is a linear transformation from R* to R* (same dimension), then the 
6.10 following are equivalent: 
(a) T is one to one. 
(b) T is onto. 
(c) T* is one to one. 
(d) T* ts onto. 
Proof Theorem 6.5 shows that (a) and (b) are equivalent, and also that (c) 


and (d) are equivalent. Theorem 6.9 shows that (b) and (d) are equiva- 


202 g/algebra and geometry in R” 


DEFINITION 
6.11 


7 


THEOREM 
7.1 


Proof 


Exercise 1 


Exercise 2 


lent. [Or, if you like, Theorem 6.8 shows that (a) and (d) are equivalent 
and that (b) and (c) are equivalent.] 


The rank of a linear transformation is the dimension of its range. 


MATRICES AND LINEAR EQUATIONS 


The theorems of the last section have some intriguing applications to matrices 
and to the theory of systems of linear equations. Let M be the matrix {aij}; 
that is, 


O11 12 Aim 

Qo, M22 om 
M= 

Qni Ong °" * Onm 


Each of the columns can be considered as a vector in R*. The column rank of 
the matrix is the maximum number of linearly independent columns, or, what 
is the same, the dimension of the subspace that they span. Similarly, the rows 
can be considered as vectors in R™, and the row rank is the dimension of the 
subspace that they span. The following theorem is quite surprising. 


The row rank of any matrix is equal to its column rank. This number 1s 
called the rank of the matrix. 


letvcumueer ec, aud fi... . , f, be the maturalibases of R™ andy 
[that is, e, = (1,0, . . . , 0), etc.], and let 7 be the linear transformation 
with matrix M relative to these bases. Then the jth column of M is 
precisely Te;, so the column rank is the dimension of the range of T. 
Since the matrix of T* is obtained by interchanging rows and columns, 
the row rank of M is the dimension of the range of T*—and Theorem 6.9 
says that the two are equal. 


If M is the matrix of a linear transformation T (relative to any bases), then 
rank M = rank T. 


If M = C-!NC, where M and N are matrices and C is an invertible matrix, 
then rank M = rank N. 


THEOREM 
ee 


THEOREM 
7.3 


DEFINITION 
7.4 


matrices and linear equations 203 


Let E denote the system of equations 
m 
E: ) oxsns = yi for 7 = hee. 
gal 


and let Ey denote the special system when y = 0. (Eo is called the homogeneous 
system corresponding to the system £.) Similarly, let E* denote the system 


n 
BX: ase: = 0; tor 7 = ly... 4 
j=1 


The system E is simply another form of the equation Tx = y, and the system E* 
is another form of the equation T*z = w. Therefore, we have the following 
theorems [formula (5) of Section 6 and Theorem 6.10]. 


The equations E have a solution if and only if y ts orthogonal to every solution 
of the equations Ej. 


If m = n, the following are equivalent: 
(a) The equations Eo have only the solution x = 0. 
(b) The equations E have one and only one solution for every y. 
(c) The equations Ej have only the solution z = 0. 
(d) The equations E* have one and only one solution for every w. 


Consider the case of two equations in two unknowns: 


ayix1 + a12xX2 = Vi15 
421X1 + A22xX2 = Yo. 


(1) 


Multiply the first equation by a22 and the second by ai2 and subtract. This 
causes the terms with x2 to drop out and gives 


(411422 = 12421) *1 = Go1 —_ G22. (2) 


Multiply the first equation by a2 and the second by ay; and subtract. This 
causes the terms with x; to drop out and gives 


— (ai1422 aaa 12421) X2 = aaiyi — 411)2.- (3) 


The determinant of the matrix 


written det M, is the number ay1422 — 412421. 


204 g/algebra and geometry in R* 


THEOREM 
Hoss 


Proof 


Exercise 3 


Exercise 4 


8 


THEOREM 
8.1 


Proof 


The matrix M 1s invertible if and only if its determinant is #0. If this is 
the case, then the solution to equations (1) is given by 


det G =) det & 2) 
2 422 21 2 
SS oo SSO ————— a t 4 
2 det M oc, det M (4) 


Notice that the matrices in the numerators in (4) are obtained as follows: For 
x1, replace the first column of M by y; and yo; for x2, replace the second column 
by yi and yo. 


First suppose that M is invertible. ‘Then the equations (1) have a solution 
for every y, and the solution must satisfy (2) and (3). This is plainly 
impossible if the determinant is 0, because in that case the left sides of (2) 
and (3) are 0 no matter what y is. 


Fill the small gap. 


Now suppose that the determinant is #0. Then (2) and (3) show that if 
y = 0, then x = 0, and Theorem 7.3 shows that M is invertible. 


If v = (a11, 421) and w = (aio, a22) are the two columns of M, then |det M| is 
the area of the parallelogram with vertices 0, v, w,v + w. Use this to deduce 
all of Theorem 7.5 except for formula (4). 


The determinant can be defined for square matrices of any size. The 
definition is more complicated than Definition 7.4, but Theorem 7.5 and the 
analog of Exercise 2 remain true. (See Section 11.) 


CONTINUITY OF LINEAR TRANSFORMATIONS 


Every linear transformation T:.R™ — R” is continuous. Indeed, there is a 
number M such that 


[Tx| < M|x| ond = he (1) 


The inequality (1) does imply that T is continuous, for 
Pesos oles) aay) sn — 1, (2) 
To prove the inequality (1), notice that ||x|| = |7x| + |x| is an absolute 


value on R”, and use the fact that any two absolute values are equivalent 
to get ||x|| < N|x|; hence [Tx| < (VW — 1)|x]. 


DEFINITION 
8.2 


THEOREM 
8.3 


Proof 


Exercise 1 


Exercise 2 


THEOREM 
8.4 


Proof 


continuity of linear transformations 205 


If T:R"™ > R* is a linear transformation, then 


||| = sup{|7x|:]] = 1}. (3) 
The number |\T|| ts the least number M for which formula (1) holds. 
Moreover, 

|||] = sup {|Fx|:]x] < 1}. (4) 


Taking |x| = 1 in formula (1), we get | 7x| < M; hence ||T|| <M. On 
the other hand, if x ¥ 0, then x/|x| has absolute value 1, so 


x 
re | < ||TI| 
[x| 
hence 
| Fe tor alll. (5) 


This shows that formula (1) holds with M = ||T||, and it was shown in the 
first sentence of the proof that (1) cannot hold with any smaller number— 
which proves the first part of the theorem. 

To prove the second part, notice that by the definition of things the 
sup on the right side of (4) is >||T7\|, while by formula (5) it is <|| Tj. 


If {aij} is the matrix of T relative to any orthonormal bases, then 
ITI? < ) af (6) 
4) 
(Hint: Write y; = 272, a,;x; and use Cauchy-Schwarz.) This gives a different 
proof of Theorem 8.1 that does not use the fact that any two absolute values on 
R™ are equivalent. 


We shall write £,,, for the space of linear transformations from R”™ to R*. 


|| Z| is an absolute value on £,,,; that is, 


IS + TI < |S] + [7] and le] = Jel || TI. 
The exercise shows that £,., is a metric space with the metric 
a(S, T) = ||S — TI. (7) 
The space &mn is complete. 


If {7;,} is a Cauchy sequence and x € R”, then formula (5) shows that 
[Tix — Tix] < ||Te — Till [x], 


206 


g/algebra and geometry in R* 


Exercise 3 


Exercise 4 


THEOREM 
8.5 


Proof 


THEOREM 
8.6 


which implies that { 7;,x} is a Cauchy sequence in R". Since R” is com- 
plete, we can define 
Tx = lim Fy. (8) 


k— 00 


This defines T as a function from R” to R”, and the first problem is to 
show that Tis linear. For this we have 


T(x + y) = lim (Tix + Try) = lim Thx + lim Tey = Tx + Ty, 
kk kk 00 


ko a 


and, similarly, T(ax) = aTx. 

The next problem is to show that 7, — T. Let e > 0 be given and 
choose ko so that if k > ko and / > ko, then || 7, — Ti|| <e«. We shall 
show thatifk > ko, then || 7 — T|| < 2e. Indeed, for any x with |x| ao.) 
we can choose / in accordance with definition (8) so that 


|Tx — Tix| < and Sige 
Now, if & > ko, then we have 


[Tx — Tx| < [Tix — Tix| + [Tie — Tx| < \|Ti — Till |x| + | Tux — Tx 
< efx] +e < 2e ihe Eel st 


from which it follows that ||7;, — T|| < 2e, as claimed. 
The theorem can also be proved by choosing bases in R™ and R", identifying 
Lmn with the space of m by n matrices, and then the latter with R”™"—and, 
finally, using the fact that any two absolute values on R™ are equivalent. 
Carry out this program. 
its, 2S 27 en 
Goes Sf, (aT)* = aT*, = al, (9) 


A linear transformation T 1s one to one if and only if there is anumber m > 0 
such that 
(2x = ale forall x. (10) 


If (10) holds, then the null space of Tis {0}, and Tis one toone. On the 
other hand, if T is one to one, then |x|| = | 7 x| is an absolute value on 
R”, and the equivalence of any two absolute values implies that (10) holds. 


Let T be one to one and satisfy 


|Tx| > m|x|. 


continunity of linear transformations 207 


If \|\S — T|| < € < m, then S is also one to one and satisfies 


[Sx] 2 (m — |x]. (11) 
Proof |Sx| > |Tx| — |Tx —. Sx] > m|x| — efx]. 
THEOREM The one-to-one linear transformations from R™ to R® form an open subset of 
8.7 Lmn- The linear transformations from R™ onto R® form an open subset of mn. 
Proof The first part comes from Theorem 8.6. As for the second part, if T 


maps R”™ onto R*, then 7* is one to one, so there exists m such that 
|T*y| > mly| for all y E R*. 


Now, if ||S — T|| <m, then by Exercise 4, ||S* — T*|| < m, so by 
Theorem 8.6, S* is one to one, and then S is onto. 


s T 
Exercise 5 If R' > R™ — R’, then || TS|| < ||7'j ||S|]. 


THEOREM The invertible linear transformations from R” to R” form an open subset $ of 
8.8 Lan, and the function Q:9 — g (reciprocal) defined by 
SVD) IE’ 


is continuous. 


Proof It is already shown in Theorem 8.7 that 9 is open. To show that ® is 
continuous, let T € 9 satisfy |Tx| > mlx]. If ||S — T|| < m/2, then by 
Theorem 8.6, |Sx| >(m/2)|x|,|from| which it follows that ||S—}|| < 2/m. 
Now we have S-! — T-! = T-(T — S)S—!; therefore, 


2 
IS — Py srr sss |r — si. (12) 


Exercise 6 Let 7:R"™— R*. There exists 5:R"— R”™ 
(a) such that ST = J if and only if T is one to one. 
(b) such that TS = J if and only if T is onto. 
[Note that the Jin (a) is the identity on R”, while the Jin (b) is the identity 
on R*.] 


Exercise 7 The function r(7) = rank T is lower semicontinuous on Lmn- 


208 


g/algebra and geometry in R” 


2) 


DEFINITION 
9.1 


DEFINITION 
9.2 


THEOREM 
9.3 


Proof of the Theorem 


Exercise 1 


SELF-ADJOINT TRANSFORMATIONS 


There are two important special kinds of linear transformations that we shall 
look at briefly—the self-adjoint transformations, which correspond geometrically 
to stretchings in various directions, and the orthogonal transformations, which 
correspond to rotations and reflections. ‘These are intrinsically interesting. In 
addition they provide a good hold on general linear transformations, for every 
nonsingular linear transformation is a product of one that is self-adjoint and 
one that is orthogonal. 


A linear transformation H:R" > R” ts self-adjoint if H = H™, or equiva- 
lenily if 
io) = a, Ly) for all x and y in R”. (1) 


If a linear transformation T effects a stretching in a certain direction e, 
then 7 should simply carry e¢ into a multiple of itself. In this case e¢ is called an 
eigenvector of T. 


An eigenvector of a linear transformation T:R" — R” ts a nonzero vector e 
such that Te = Xe for some real number X. The number d ts called the 
corresponding eigenvalue. 


The basic theorem is as follows: 


If H:R" > R" ts self-adjoint, then R” has an orthonormal basis composed of 
eigenvectors of H. 


Ifex, . . . ,é,i8 an orthonormal basis of eigenvectors of H, andi, . . . ,An 
are the corresponding eigenvalues, then geometrically H just effects a stretching 
by an amount A, in the direction e. (Of course, 4; may be negative.) In 


terms of matrices, relative to the basis ¢1, . . . , én the matrix of H has Aj, 
. . . , An along the diagonal and 0’s everywhere else. In terms of linear equa- 
tions, with the coordinates relative to the basisei, . . . , én the equationy = Hx 


is equivalent to the system 
ye = MX, t= 1s 2 25, Ml, 


whicn is, of course, trivial to solve. 
The main job is to show that H does have an eigenvector ! 


Give an example of a linear transformation on the plane that has no eigenvector. 


Exercise 2 


Exercise 3 


Exercise 4 


self-adjoint transformations 209 


What we shall show is that if M/ is the maximum of (Hx, x) on the unit 
sphere, and ¢ is the point where the maximum is assumed, then e is an 
eigenvector with eigenvalue M. The maximum makes sense because 
(Hx, x) is continuous and the unit sphere is compact. The proof is very 
quick, but itis a trick. Define 


(x, y)o = M(x, y) — (Hx, y). 


The definition of M gives (x, x)o > 0 and the fact that H is self-adjoint 
gives (x, y)o = (y, x)o, So this is an inner product in the sense of Theorems 2.3 
and 1.2 of Chapter 7. Therefore, the Cauchy—-Schwarz inequality gives 


, yo = I(x, x)o| ly, y)ol- 


If x« = e, then (x, x)o = 0, so we have 
(Me — He,y) = (¢, y)0o = 9 for every y, 


which implies that Me — He = 0, or that He = Me. 

Now induction carries the rest of the proof. Set V = {e}+. Clearly, 
HV) GV) for if «e VY, then (Hx, ec) =, He) — (x, Me) — 0, (Mere 
again we use the fact that H is self-adjoint.) Therefore, the restriction of 
H to V is a self-adjoint linear transformation on V—and JV, of course, is the 
same as R*—'. By induction on the dimension, V has an orthonormal basis of 
eigenvectors, which, combined with e, gives an orthonormal basis of R”. 


To be perfectly proper about this inductive proof, we should start by defining a 
self-adjoint transformation H:V— V, where V is any subspace of R", and then 
state the theorem in the form that if H:V — V is self-adjoint, then V has an 
orthonormal basis composed of eigenvectors. Give the definition by using 
formula (1) and the proper proof. 


The eigenvalues of a linear transformation are uniquely determined. They 
are simply the numbers ) such that Je = de for some nonzero vector e. The 
eigenvectors are determined, too, but the orthonormal basis in Theorem 9.3 is 
not determined. For instance, if H = J, then A = 1 is the only eigenvalue, 
but every nonzero vector ¢ is an eigenvector. The next two exercises show that 
the eigenvalues that appear in Theorem 9.3 are in fact all the eigenvalues and 
show how much uniqueness there is in the orthonormal basis. 


Let T:R*— R*. Let e1, . . . , én be linearly independent eigenvectors with 
eigenvalues Aj, . . . , An. Then 7 has no other eigenvalues. 


Let H:R"— R’ be self-adjoint. Ife and f are eigenvectors with distinct eigen- 
values, then e | f. As far as the uniqueness in Theorem 9.3 is concerned, the 


2I0 


g/algebra and geometry in R® 


Exercise 5 


THEOREM 
9.4 


Proof 


conclusion of the two exercises is as follows. Let H:R"— R” be self-adjoint, 
and let Ai, . . . , Ax be the distinct eigenvalues. Let V; be the null space of 
H — djl, that is, the set of eigenvectors with eigenvalue \; together with 0. 
Choose any orthonormal basis whatever of V;. Then the union of these is an 
orthonormal basis of R” composed of eigenvectors, and this is the only way to 
get such a basis. 


Prove the assertions above. 


A closer examination of the proof of Theorem 9.3 gives some interesting 
information about the eigenvalues. Note first that if \ is any eigenvalue of the 
self-adjoint H and ¢ is a corresponding eigenvector of length 1, then (He, e) = 
(Ae, ¢) = X. Therefore, the eigenvalue M produced in the proof is the largest 
eigenvalue. By carrying out each step in the induction, we shall pick out the 
eigenvalues in decreasing order. Indeed, let 


Mi = sup(ix, x), 


|z|=1 


and let ¢: be a point where the maximum is assumed, so that A, is the largest 
eigenvalue and ¢; is a corresponding eigenvector. Now let V; = {e:}+, let 


2 = sup{ (Hx, x):|x| = 1 and x € V;}, 


and let ¢2 be a point where the maximum is assumed. By the same proof, dg is 
an eigenvalue and ¢2 is a corresponding eigenvector. In general, when Aj, 
ee heandizines) -, ¢, nave been picked outplet Vy, = (21, 2. er len 


Arg = sup{(Hx, x):|x| = 1 and x € Vy}, 


and let ¢,;1 be a point where the maximum is assumed. In this way we obtain 
eigenvalues A; = \2 > * - + 2 A, with corresponding orthonormal eigenvec- 
tors é1, . . . , én. Exercise 3 shows that 7 has no other eigenvalues. 

This method picks out the eigenvalucs successively, starting with the largest. 
There is an important formula that picks out the sth largest directly, without 
making use of the previous ones and without making use of the eigenvectors. 


If H ts self-adjoint, then the kth largest eigenvalue of H is given by 
Ae = inf(sup{(Hx, x): |x| = 1 and x C W}), (2) 
Ww 


where the inf is taken over all subspaces W of dimension n — k + 1. 


Let wi > we > + * * > bn be the eigenvalues with corresponding ortho- 
normal eigenvectors ¢1, ..., én. Since V; has dimension n — f, it is 


DEFINITION 
9.5 


Exercise 6 


Exercise 7 


THEOREM 
9.6 


Proof 


Exercise 8 


self-adjoint transformations 2I1 


plain that \x41 < wei. Now, if W is any subspace of dimension n — fk, 
then WO [e1, . . . , exii] # {0} (why?); so there exists a point x GC W 
with |x| = 1 and 


and we have 
k+1 
(Hx, x) = » MiX; Mey. 


go 


This shows that for each W the sup in formula (2) is >px41, 80 Ang > Mes. 


The statement H > K means that both H and K are self-adjoint and that 
(Hx, x) > (Kx, x) for every x. If H > 0, then H 1s called positive 
definite. 


Let H and K be self-adjoint with eigenvalues \1 > © +: 2A, and wi > 
+++ > yp, If > K, then A; > p,; for each. (Hint: Use Theorem 9.4.) 


This result is quite important in numerical work with eigenvalues. These 
cannot be computed explicitly in general, but Exercise 6 can be used to obtain 
good bounds. Given H, you look for K and L such that L > H > K and such 
that the eigenvalues of K and L can be computed explicitly. 


If ZH is self-adjoint, then 
IPell| = pete x)). 


Theorem 9.3 can be used to construct interesting functions of self-adjoint 
transformations. For example, 


Every positive definite transformation has a unique positive definite square root. 


Let H be self-adjoint with eigenvalues \i, . . . , An and corresponding 
eigenvectors ¢1, ... én. If H > 0, then by Exercise 6 each \; 2 0; so 
we can define p; to be the nonnegative square root of \;, and then define K 
by Ke; = je). Then Re; = K(Ke;) = pj Ke; = Be; = dye; = He; for each 
j, which implies that K? = H. Exercise 6 shows that K is positive definite. 


Prove the uniqueness. 


212 


g/algebra and geometry in R" 


Remark 


Exercise 9 


Exercise 10 


Exercise 11 


Exercise 12 


Exercise 13 


10 


DEFINITION 
10.1 


The term positive definite refers to the fact that the sign of (Hx, x) is definite. 
It is not sometimes positive and sometimes negative. The transformation H is 
negative definite if (Hx, x) < 0 for all x. It is indefinite if (Hx, x) is sometimes 
positive and sometimes negative. The transformation H is strictly positive 
definite, written H > 0, if (Hx, x) > 0 for all x ¥ 0, and is strictly negative 
definite if (Ax, x) < 0 for all x = 0. 


A self-adjoint H is strictly positive definite if and only if all eigenvalues are >0. 


Are there any linear transformations that are both positive definite and negative 
definite? 


The converse of Theorem 9.3 is also true. If R* has an orthonormal basis 
composed of eigenvectors of H, then H is self-adjoint. What can you say about 
T if R* has a basis (but not orthonormal) composed of eigenvectors of T? 


The projection P on a subspace V of R” is defined as follows: Each x € R" can 
be written uniquely in the form x = x’ + x’’, and then Px = x’. Show that P 
is self-adjoint and that P? = P. What are the eigenvalues? Show that if Q is 
self-adjoint and Q® = Q, then Q is the projection on some subspace. 


Let H:R"— R® be self-adjoint with eigenvalues \y > - + * >2A,y. Let P be 
the projection on a subspace V of dimension m, and let p1 > + - - > pn be the 
eigenvalues of K = PHP. Show that 


Nie inne m for k < m. 


[Hint: Show first that in Theorem 9.4 the inf is the same if it is taken over all 
subspaces W of dimension >n —k-+ 1. Show next that if W has dimen- 
sionn —k+ 1, then W’ = VC\ Whas dimension > m—k+1=n— (K+ 
n—m)-+41. Now use Theorem 9.4.] 


ORTHOGONAL TRANSFORMATIONS 


The orthogonal transformations correspond geometrically to rotations and reflec- 
tions, at least in R*. 


A linear transformation U:R™ — R” is orthogonal if (Ux, Uy) = (x, y) 
for every x and y in R™. 


THEOREM 
10.2 


Proof 


Exercise 1 


Exercise 2 


Exercise 3 


THEOREM 
10.3 


Proof 


orthogonal transformations 213 


The following conditions are equivalent on a linear U:R™ — R®. 
(a) U is orthogonal. 


(jeer = fF. 

(c) For every orthonormal basis e1, ... , @m of R™, the set Ueu, 
. » Vem is orthonormal in R". 

(d) For some orthonormal basis e1, ... , €m of R™, the set Uea, 


. . 5 Vem ts orthonormal in R*. 


We shall show that (d) implies (a) and leave the other implications as an 
exercise. Ife1, . .. , émis orthonormal and 


m mm 
— > X50; and y= > GHA 
t=1 t=] 


sy) = Y noes, a) = > 29% 


Gri) t 


then 


because (¢;, ¢;) is Oif7 # j andis1ifz = 7. Thesame argument applied to 
Ver, . « . , Cém, in place of ¢1, . . . ; ém Shows that 


(Ux, Uy) = Zxyi = (x,y). 


Show that (a) = (b) = (c) = (d) in order to complete the proof of the theorem. 


A linear U:R™— R?® is orthogonal if and only if it preserves distance in the 
sense that 


eee) = for all x C R™. 


(Hint: Prove the identity 4(z, w) = |z + w|? — [z — w|? and apply it to x 
and y and to Ux and Uy.) 


What can you say about UU* when U is orthogonal? What can you say in 
the case where m = n? 


If T:R™— R* ts one to one, then T = UH, where U:R™— R®* is 
orthogonal and H:R™ — R” is positive definite. 


To see what U and H must be, suppose that the theorem is true. Then 
to bee oso P= Un = A? Now 1* 7 is strictly 
positive definite from R™ to R”, for 


Ce ei Exe tx? > 0 for x ¥ 0. 


214 


g/algebra and geometry in R* 


Exercise 4 


Exercise 5 


DEFINITION 
10.4 


THEOREM 
10.5 


Proof 


Therefore, we can start back at the beginning and use Theorem 9.6 to 
define H to be the positive-definite square root of T*T and then define U 
to be TH~'. What has to be shown is that U is orthogonal, and this 
follows from 


OU = 9 MIE IGE! = FE g i aif 


Why is the H above invertible? 


In the proof we used the fact that (H~-!)* = (H*)-'. Prove this for any 
H:R™ — R”. 


Next we give a theorem to show that an orthogonal transformation 


U:R* — R" can be interpreted geometrically as either a rigid motion that leaves 
the origin fixed (i.e., a rotation) or else as a reflection followed by such a 
rigid motion. The following notation is customary for the set of orthogonal 
transformations. 


O,, is the set of orthogonal transformations from R® to R®. 


Every U © O, can be joined by a path in O,, either to the identity or to the 
reflection J defined by 


Jx = (x1, OO 0 > *n—1, —Xn), 


which 1s the reflection across the subspace x, = 0. 


We show first that U can be joined by a path in O, toa V with the property 
that Ven = +én. Ife, and Ue, are linearly dependent, then Ue, = +én, 
since both en and Ue, have length 1. Otherwise, the two span a two- 
dimensional space V2 in which we shall make a suitable rotation. Let 
f = Ue, and let g be a unit vector in V2 that is perpendicular tof. (There 
are two of these, and we just choose one.) Since e, is in V2 and has 
length 1, there is a number 0, 0 < @ < 2z, such that 


én = f cos @ — gsin @. (1) 
Now define R,, 0 < ¢ < 0, as follows: Rix = x if x C Vi and 


Rif = fcost — gsint, 
Rig = fsint+ g cost. 


Exercise 6 Verify that R; is orthogonal. 


Exercise 7 


THEOREM 
10.6 


Proof 


Exercise 8 


orthogonal transformations 215 


Ro = I and Ref = e, [formula (1)], so U. = RU is a path from Up = U 
to Us = V, where V does have the required property that Ve, = ten. 
(In fact, Ve, = e,, but we have to take account of the initial possibility 
that Ue, = —en.) 

The proof is finished by induction on the dimension. With the usual 
identification, R*~! is the space spanned by ¢, . . . , én-1. If V’ is the 
restriction of V to R*™!, then V’ is an orthogonal transformation from 
R™-!toR*™"._ (Why?) Therefore, induction gives a path V; inO,_1 which 
joins V’ to either I’ or J’. Let V; be defined by Vix’ = Vix! if x’ © R77 
and Vy, = én (or Vien = —e, if it happens that Ven = —e,). Then V; is 
a path in O, that joins V to one of the following: 


We= ik We = Ux, —x,), We =U x, x), We = (*, =e): 


Show how to join the first two to J and the second two to J. 


In the next section we shall show that the two possibilities in Theorem 10.5 
are mutually exclusive, or, in other words, that J cannot be joined to J by a 
path in O,. Another way to interpret this is that O, has exactly two connected 
components, one containing J and the other containing J. Note that if we 
carry out the inductive steps in this proof, they snow that the initial U can be 
joined to either J or J by a path that consists of a finite number of rotations, 
each rotation taking place in a two-dimensional subspace and leaving the 
orthogonal complement of this two-dimensional subspace fixed. 

There is a corresponding theorem for the set J of invertible transformations 
on R*. 


Each T € § can be joined by a path in § to either the identity I or to the 
reflection J. 


Use Theorem 10.3 to write J = UH and Theorem 10.5 to find a path 
U; in O, that joins U to either Jor to J. Set H, = (1 — H+ u and 
then T; = UA. 


Show that éach HA, is strictly positive definite, and therefore that 7; 1s a path in 
g which joins TJ to either J or J. 


Again the same comment holds. J cannot be joined to J by a path in J, 
so g has exactly two connected components—one containing J and the other 
containing J. This is shown in the next section. 


216 


g/algebra and geometry in R* 


1] 


DEFINITION 
11.1 


THEOREM 
11.2 


Proof 


Exercise 1 


Exercise 2 


DETERMINANTS 


In this section we shall give the definition of the determinant of an n by n 
matrix and shall establish the properties that will be needed later. 


A permutation of a set I is a one-to-one function from I onto itself. If wand 
v are permutations of I, the product wv is the composite wo v. A permutation 
1S @ transposition if there exist two distinct points 1 andj in I such that 
w(t) = j, u(y) = ¢ and w(k) = k for all other points of I. 


We shall be interested only in the case where J is the set {1, .. . , n}. 
It is clear that every permutation is a product of transpositions—first interchange 
1 and »(1), then interchange 2 and (2), andsoon. This can be done in many 
ways. For instance, if » is a transposition, then »? = 1 (the identity), so in any 
product of transpositions, factors 42 can be inserted at random without changing 
the product. 


In the various expressions of a permutation w as a product of transpositions, 
the number of factors is always even, or else the number of factors is always odd. 


If f is a real-valued function on R* and yu is a permutation of J = {1 
. . . ,m}, define the function f, by 


a 


ese soe 4 1) = f(xua, one Spay 


Show that Gay = ae 


Let f be the particular function 


f&) = I] @ — x), (1) 


t<j 


where IJ denotes thc product over the indices in question. If n = 3, for 
example, then f(x) = («1 — x2)(%1 — x3)(x2 — x3). Show that if » is a trans- 
position, then f,(x) = —f(x). 


The two exercises togethcr show that if u is the product of an even number 
of transpositions, then f(x) = f(x), while if «is a product of an odd num- 
ber of transpositions, then f,(x) = —f(x). This characterizes the evenness 
or oddness of the number of transpositions directly in terms of u. 


determinants 217 


DEFINITION The sign of the permutation p, written e(u), 1s the number 1 if pis the product 
11.3 of an even number of transpositions and is the number —1 if w is the product of 
an odd number of transpositions. 


Exercise 3 e(uv) = e(uye(y). 
DEFINITION The determinant of the n by n matrix A = {ax} ts the number 
11.4 
det A = Y armanaauen 7 * Anu(n), (2) 
LB 
where the sum is taken over all permutations p of I = {1, ... , nb}. 


Exercise 4 Write out the sum in full in the 2 by 2 case and in the 3 by 3 case. [There are 
n! permutations of the set {1, . . . , 2}, so there are n! terms in the sum (2). 
Already this becomes a little burdensome at n = 4, for there are 24 terms. ] 


THEOREM If B is obtained from A by making a permutation a of the columns, then 
11.5 det B = e(x) det A. 
Proof li 3 —a.c), then 
det B = YW) banea cakes Daun = Y, elu)areuen ° * * Anap(n)- 
B B 
If we write mu = v and use the fact that e(u) = e(r)e(v) and the fact that 
as » runs through all the permutations of {1, . . . ,n}, so does v, we get 
det B = e€(r) » é)aiay °° Guu) = em) det A, 


If A is a matrix, then A* is, of course, the matrix obtained by inter- 
changing rows and columns. In other words, if A is the matrix of a linear 
transformation T relative to an orthonormal basis, then A* is the matrix 


oi IP. 
THEOREM det A* = det A. 
11.6 
Proof We have 


det A* = Y eu)ahn ee a) = Y, eudancas 7+ Au(nyns 


me i 


218 


g/algebra and geometry in R* 


COROLLARY 
11.7 


Proof 


COROLLARY 
11.8 


Proof 


THEOREM 
11.9 


Proof 


If we write uw! = y and use the fact that e(u-!) = e(u) and the fact that 
as » runs through all permutations, so does y~!, we get 


det A* = > Marwan 2 2” Leng = det A, 


v 


If B is obtained from A by making a permutation r of the rows, then det B = 
e(r) det A. 


The permutation of the rows can be effected by first interchanging rows 
and columns, then making the permutation on the columns, and finally 
interchanging rows and columns again. The first and last operations 
leave the determinant unchanged, while the middle one multiplies by e(r). 


If A has two equal rows or two equal columns, then det A = 0. 


The transposition interchanging these rows (or columns) has no effect, 
but on the other hand it multiplies the determinant by —1. 


det AB = det A det B. 
Let A = {ay}, B = {b:;}, and AB = {ci}, so 
Ci = » GeO; (3) 
k 


and 


det AB = » €(u)e1yc1) " * * Eny(n)- (4) 
BK 


Now we shall substitute (3) into (4), but for each 7 we shall have to use a 
different index of summation. We get 


det AB = De capt Bs >, (wan beucn Co Gnk Ue woe 
wo kt kn 
Fix ki, . . . , kn and consider the sum with respect top. The sum 
Lis wsky = >, Paar eee. 
v3 


is just the determinant of the matrix in which the first row is the kist row 
of B, and in general the jth row is the 4,th row of B. Ifki, . . . , kn are 
not all distinct, then this is 0 by Corollary 11.8. Ifki, ... , kx are all 
distinct, then the function m defined by r(j) = &; is a permutation, and by 
Corollary 11.7 we have 


Ab s.sk, = €Gr) det B. 


THEOREM 
11.10 


Proof 


DEFINITION 
11.11 


THEOREM 
11.12 


Proof 


determinants 219 


Thus, 
Ace Wn = det B) Ae een = A Ge 
If A has 1, . . . , An along the diagonal and 0’s everywhere else, then 
det A= Ny > = An. 


This is clear from the definition. If » is not the identity permutation in 
formula (2), then the corresponding term in the sum is 0, for it contains 
an a,j; off the diagonal. 


Theorems 11.9 and 11.10 make it possible to define the determinant of a 
linear transformation. 


If T is a linear transformation from R” to R®, then det T is the determinant 
of the matrix of T relative to any basvs. 


For the definition to make sense it must be true that the determinant of 
one matrix of 7 is the same as the determinant of any other matrix of T. If 
Aand B are matrices of T relative to different bases, then, according to Exercise 
10 of Section 5, we have B = CAC“, where C is some invertible matrix. First 
we notice that if J is the identity matrix, then by Theorem 11.10, det J = 1. 
Next we notice that by Theorem 11.9 we have det C det C7! = 1, because 
CC-1 =I. Finally, again by Theorem 11.9, we have det B = det C det A 
det C“! = det A. 


The function d(T) = det T is characterized by the following properties: 

(a) d(T) is continuous from Snn to R?}. 

(b) d(T*) = d(T). 

(edt = a(S) d(T): 

(d) If H is self-adjoint with eigenvalues 1, . . . , Xn, then d(H) = 
Meee Ane 
To see (a), identify £,, with R” by identifying each linear transformation 
with its matrix relative to the standard orthonormal basis of R*. The 
absolute value on £n, is equivalent to the absolute value on R™, for any 
two absolute values on R™ are equivalent. Itis plain that the determinant 
is continuous on R”, for it is a polynomial in the n? coordinates. Parts 
(b) and (c) come from Theorems 11.6 and 11.9. Part (d) comes from 
Theorem 11.10. (Use the orthonormal basis of eigenvectors.) 

Now we have to show that if d is any function with the properties 
listed, then d(T) must be the determinant of T. In the course of the 
proof we shall find a number of additional basic properties of the deter- 
minant, which we shall list as theorems as we go along. Notice first that 


220 


g/algebra and geometry in R” 


THEOREM 
11.13 


Proof 


THEOREM 
11.14 


Proof 


THEOREM 
11.15 


Proof 


again we have 


20) ieee and dd) = 1 


for J has n eigenvalues all equal to 1, and J has n — 1 eigenvalues equal 
to 1 and one equal to —1. In both cases the standard basis of R” is an 
orthonormal basis of eigenvectors. 


T is invertible if and only if d(T) # 0. 


If T is invertible, then J = TT—!, so 1 = d(T)d(T-), which certainly 
means that d(T) ~ 0. If T is not invertible, then it has a nontrivial null 
space NV. If P is the projection on N+ (see Exercise 12 of Section 9), then 
[= TP. Indeed, if x € N, then Tx and Px are both 0, while if x € N+, 
then Px =x. Thus, d(T) = d(T)d(P). But d(P) = 0, for P is self- 
adjoint and 0 is an eigenvalue (with any nonzero vector in N as a corre- 
sponding eigenvector). 


The space 5 of invertible transformations from R®” to R” has exactly two 
connected components. 


According to Theorem 10.6, § has at most two components and J cannot 
be connected, for the continuous function d(T) takes the value 1 at J and 
the value —1 at J, but does not take the value 0 anywhere on J. 


The space On of orthogonal transformations has exactly two connected com- 
ponents. If U 1s orthogonal, then d(U) = 1 if U is in the component of I, 
and d(U) = ~1 if U is in the component of J. 


If U is orthogonal, then J = U*U; so 
1 = d(U*)d(U) = d(U)?, 


which shows that d(U) = +1. A continuous function that takes only the 
values +1 must be constant on any connected set. Hence, d(U) = 1 on 
the component of J and d({U) = —1 on the component of J. Theorem 
10.5 shows that there are no other components. 


These theorems characterize d as the determinant. If T is not invertible, 
then d(T) = 0. If T is invertible, then T = UH, where U is orthogonal and 
His positive definite; then d(T) = d(U)d(H). The value of d(U) is determined 
by Theorem 11.15 and the value of d(H7) is determined by property (d) of 
Theorem 11.12. There are various ways to characterize determinants. This 
one happens to be natural if one has in mind the applications to volumes and 
areas, although in that case it is really the absolute value of the determinant that 
is relevant. 


THEOREM 
11.16 


Exercise 5 


Exercise 6 


Exercise 7 


determinants 221 


The function J(T) = |det T| is characterized by the following properties: 
CG) Gr i). 
ACO) ACN WAGE 
(c) If Te: = dAxei, where e1, . . . 5 en ts the standard orthonormal 
basis of R® and d; > 0, thenJ(T) = 1 - + + An. 


Prove the theorem. [You will have a little work to do because property (c) in 
this theorem appears to be quite a bit weaker than the corresponding property 
(d) of Theorem 11.12. The idea is to deal first with orthogonal transformations 
and then to get from the weaker property to the stronger by an orthogonal 
transformation. It is too bad to have two J’s around, but in fact you do not 
need the reflection at all. The present J is called the Jacobian and J is the 
traditional letter for it.] 


If Te; = Aver, where ¢é1, . . . , én is any basis of R” (not necessarily orthonormal), 
then det T = Ai * * * An. 


Let T:R*—> R* and set p(A) = det(T — AZ). Show that p is a polynomial of 
degree n whose real zeros are the eigenvalues of T. 


From the fact that T is invertible if and only if det T # 0, one might hope 
that there is a formula for 7~! in terms of the determinant. In fact, there is a 
very nice formula which we can discover by calculating the determinant in a 
particular way. If A is a matrix, then according to the definition we have 


det A = pa €(7) @1n(1) " * * Ang(n): 
Fix an index j and sum first over the permutations 7 with 2(n) = j, and then 
sum over j to get 
det A = en > €(r) a1n(1) 88 An—in(n—1)> 
j a(n) =] 
If x; is the transposition that interchanges n andj and if» = z,7, then p(n) = n, 


so u is a permutation of {1,...,2—1}. Since r = mu, we have 


det A = — Y ans) €(u) a1 rju(1) “ * * A@n—Inju(n—1)+ 


j B 


Let B; be the (n — 1) by (n — 1) matrix defined by 


bam = Gkr,(m) for 1 < k, m & n— 1. 
Then 
ig le 2 Dans det B;. 


- 


222 g/algebra and geometry in R* 


LEMMA 
11.17 


THEOREM 
11.18 


Proof 


THEOREM 
11.19 


Exercise 8 


Exercise 9 
Cramer’s Rule 


At this point we have almost proved the following lemma. 


Tf Anj ts the matrix obtained by crossing out the nth row and jth column of A, 
then 


deca » (=1)"a,5 det Ay. 
J 


To finish the proof simply note that 
det B; = (—1)*~*! det A,,, 


which follows from the fact that we get A,; from B; by moving the jth 
column of B; successively past the n — 7 — 1 columns that follow it. 


It is customary to write 5;, for the number in the ith row and sth column of 
the identity matrix. Thus, 5 is 1 if 7 = & and is 0 if 7 # k. 


Tf Aj is the matrix obtained by crossing out the ith row and jth column of A, 
then 


>) (NES andes = Ged (5) 
j 


First suppose that k = 7. Then formula (5) follows from Lemma 11.17 
if we move the ith row of A successively past the n — 7 rows that follow it. 
Now suppose that k #7. If we replace the kth row of A by the 7th row 
of A, we get a new matrix B with determinant 0 (because two rows are 
the same). Now note that A,; = B;; and use formula (5) to calculate the 
determinant of B. 


Theorem 11.18 gives the following explicit formula for the inverse matrix. 


If det A # 0, then the element in the jth row and kth column of Am ts 
(—1)*+/ det Ax;/det Bl 


Use adjoints to show that 


) (— ip ame Qi; det Axx = b5% det A. (6) 


Let 7:R* — R" bea linear transformation with matrix A. Show thatif Tx = y, 
then 
x, det A = det A;, 
where A, is the matrix obtained from A by replacing the éth column by the 
vector y. 
[Hint: Multiply both sides of (6) by x; and sum on /.] 


10 : Linear Approximation 


1 


DEFINITION 
1.1 


DEFINITION 
1.2 


223 


DIRECTIONAL DERIVATIVES AND PARTIAL DERIVATIVES 


The derivative of a function from R” to R® cannot be defined as it was for 
functions from R'toR”. The difference quotient f(x) — f(a@)/x — ais meaning- 
less because x — a is a vector. Nevertheless, the derivative in any direction 
can be defined. 


The function f:R™ — R* is differentiable at the point a in the direction 0 if 
the function ¢(t) = f(a + ¢6) is differentiable att = 0. If this is the case, 
then y' (0) ts called the directional derivative of f at the point a in the direction 
6 and is written Dof(a). 


Equivalently, 


Dey = tins eae 


t0 t 
t+0 


(1) 


It should be clear that a is a point in R” and @ is a direction in R™. It is 
not necessary, of course, that f be defined on all of R™. All points near a are 
enough. Quite often we shall write f: R” — R” when f is only defined on a 
suitable subset of R”, and in most cases we shall leave it to the reader to make 
the correction. 

The directions along the coordinate axes are particularly important. 


Let e1, . . . 5 €m be the natural basis of R™, and let f:R™— R*. The 
directional derivative in the direction e; is called the partial derivative of f with 
respect to x; and is written Djf(a), or Of(a)/Ox;, or fz,(a). 


224 1o/linear approximation 


It will be shown in the next section that under suitable hypotheses on the 
function f the derivative in any direction 6 is expressed in terms of the partial 
derivatives by the formula 


Defla) = ) 6;Dif(a). (2) 


Thus, the partial derivatives determine the derivatives in alldirections. Further- 
more, the partial derivatives can be calculated as easily as ordinary derivatives. 


Exercise 1 Difle) = lim Gide, 5 dn) = ain, oe 1 An) 
Zirar x1 — aj 
Tix 


What the exercise means is this. To get the partial derivative with respect 
to xi, fix all the other variables and differentiate the resulting function of »}. 


Example Calculate the partial derivatives of 
Ty) = & sin x 


To get the partial derivative with respect to x, we consider y constant: 


of 
cote €" COsix + yer sin x. 
Bs 


To get the derivative with respect to y, we consider x constant: 


of 


= S PE Gin &, 


dy 


To some extent the directional derivatives or partial derivatives serve the 
purpose that the ordinary derivative served for functions of one variable. For 


example: 
THEOREM Let f:R™— R! have a maximum or minimum at the point a. If f is 
ve differentiable at a in some direction 0, then Dof(a) = 0. 
Proof It is evident that if f has a maximum or minimum at a, then the function 


g in Definition 1.1 has a maximum or minimum at 0, and the old theorem 
applies to ¢. 


The theorem is most useful if all the partial derivatives exist at a. It gives 
the m equations 


fig 
Ox. a Oan 


0 (3) 


Exercise 2 


Exercise 3 


THEOREM 
1.4 


Exercise 1 


Exercise 2 


DEFINITION 
PeAl 


the differential 225 


for the m unknowns x1, ..., %m. In general, of course, these equations are 
not linear and are hard to solve. Moreover, the equations (3) only give the 
possibilities. It always has to be proved that a solution of (3) really is a 
maximum or minimum. 


Find the maxima and minima of the function 
fy) = 2x? — 2xy + y? — 2x 4+ Qy + 1. 


(It may be helpful to look back at Example 2, Section 3 of Chapter 3.) 


Use the definition of the directional derivative and Theorem 2.2 of Chapter 8 
to prove the following theorem. 


The function f:R™ — R" is differentiable at the point a in the direction 6 if 
and only if each coordinate function is differentiable; if this is the case, then 


Def (a) = (Dofi(a), Ree , Dafa: 


THE DIFFERENTIAL 


While the partial derivatives are convenient in calculations, they are not at all 
convenient in theoretical questions. For one thing they are too unwieldy, and 
for another their existence implies almost nothing. 


Give an example where the partial derivatives exist, but certain directional 
derivatives do not. 


Give an example where all directional derivatives exist at a point, but the 
function is not continuous there. 


What is needed in order to prove theorems is the possibility of approximating 
the given function by a linear transformation in the following sense. 


The function f:R™ — R” 1s differentiable at the point a tf there ts a linear 
transformation T:R™ — R® such that 


ek + h) — f(a) — Th = 


h20 || 
h¥0 


0. (1) 


The linear transformation T ts called the differential or derivative of f at a, 
and is written df(a), or f’(a). 


226 10/linear approximation 


Remark 


THEOREM 
2.2 


Proof 


Exercise 3 


THEOREM 
2.3 


The term differential is the traditional one, and the traditional symbol is df(a); 
but the term derivative and the symbol f’(a) are becoming more common now. 
Note that differentiability requires that the function be defined on some ball 
with center a, but not, of course, on all of R”. 


The differential ard the directional derivatives are related by the following 
formula—which, incidentally, shows that the differential is uniquely determined 
by formula (1). 


If f is differentiable at a, then f 1s differentiable in every direction, and 
Dof(a) = df(a)é. (2) 
Let @ be given and take h = #6, ¢ > 0, in formula (1) to get 


im Lt) = FO) 
OO 


t0 t 
t>0 


TO = 0. (3) 


Show that the same formula holds for t < 0. 


Formula (3) shows that f is differentiable in the direction @ and that 

Doef(a) = To. 

Recall that the matrix of a linear transformation T relative to bases ¢1, 

Palen alidejine fe has the coordinates of 72; relative tov are oye 
down the jth column. When /fi, . . . , f, is the natural basis of R”, ‘a vector is 
its n-tuple of coordinates. Moreover, in the present case, where T is the 
differential, Theorem 2.2 shows that Te; = De;f(a), and the latter is just the 
partial derivative. Therefore, we have a theorem about the matrix of the 
differential. 


If f is differentiable at a, then the matrix of df(a) is 


on oh | a 
Ox, Oxo OXm 
ce sail 
|) | ey 


Oxy 0X2 OXm 


THEOREM 
2.4 


the differential 227 


where the partial derivatives are calculated at a. Consequently, 


™m 


(aja) = Ya, (5) 
xj 


g=l 


The matrix df /Ax is called the Jacobi matrix of f. 


One way to remember the formula for the matrix is that to get the jth 
column you differentiate with respect to x;. 

Consider formula (5). If f is a real-valued function, that is, f:R™— R}, 
then the right-hand side is an inner product—the inner product of A with the 


vector 
= | ee 
io. (z =) (6) 


(calculated, of course, at a). 


If f 1s differentiable at a, then 
df(a)h = (Vf(a), A), (7) 


and, in particular, 


Dof(a) = (f(a), 9). (8) 


The vector Vf defined by (6) is called the gradient of f. One convenient 
way to remember formula (5) is to use (6) as a formal definition even in the 
general case where f goes from R™ to R*. In this case Vf is not really a vector— 
each component 0f/0x; is a vector. However, the formal expression for (Vf, A), 
which is 

m 
(Hla), ) = YF hy (9) 
xj 


gol 


makes perfectly good sense, because A; is a number. With these definitions, 
formula (5) is identical with formula (7). 

Another way to remember the formulas (and probably the best one for 
calculations) is like this. Write y = f(x) and then dy instead of df. Then 


formula (5) becomes 


Oy; 
dy; = » es dx;. (10) 
Ox; 


j=l 


Notice that 0x; and dx; “cancel” in each term. Two things are involved in 
proving formula (10). In writing dx; we are, of course, thinking of x; as a 


228 10/linear approximation 


Example 1 


Exercise 4 


Exercise 5 


THEOREM 
3.1 


function; that is, x; is the jth coordinate of x. Thus, x; is a function from R”™ 
to R', so dx; is a linear function from R™ to R!. It is clear that 


dx;(a)h = hy. (11) 


Therefore, the right-hand side of (10) is a linear function from R” to R!, and 
its value at A is the right-hand side of (5). On the other hand, it is shown in the 
next section [formula (1)] that (dyh); = dyish. Hence, (10) is the same formula 
asi 5): 


Let f(r, 6) = (7 cos 6, 7 sin 8); that is, 


x =17cos 8, 
y =rsiné. 


Then 


Ox Ox : 
dx = — dr + — dé = cos 6dr — rsin 6 a6, 
Or 00 


3 a 
dy = > dr + 2 do = sin 6 dr +1 cos 6 db, 
or 00 


and the Jacobi matrix is 
of & 6 —rsin ) 
d(r,6) \sin@ rcosd 


Ce): 


Calculate the Jacobi matrix of f(x, y) 


Calculate the Jacobi matrix of 


x = COS 4, y = sin u, z=, 


that is, of f(u, v) = (cos u, sin wu, v). 


EXISTENCE OF THE DIFFERENTIAL 


It is the existence of the differential that is essential in proving theorems, but it 
is the partial derivatives that one can get his hands on. What is needed is a 
way to, tell that a function is differentiable by looking at the partial derivatives. 


If the partial derivatives of f exist on a ball with center a and are continuous 
at a, thenf is differentiable at a. 


existence of the differential 229 


The proof will make use of the mean-value theorem—which is false for 
functions with values in R". Therefore, the first step is to reduce to the case 


n= 1. 
THEOREM The function f:R™ — R” is differentiable at a if and only if each coordinate 
3.2 function is. 
Proof Assume that f is differentiable at a, with differential T. We shall show 
that if f; and 7; are the jth coordinate functions of f and T, then 
dfj(a) = Tj. (1) 


In other words, the differential of the jth coordinate function is the jth 
coordinate function of the differential. In fact, this is obvious simply 
because the absolute value of any coordinate of a vector is less than or 
equal to the absolute value of the vector itself. That is, 


Csr 2) 16) ae eg e (eae aes 
[A| * [A| 


and by hypothesis the right side goes to 0 as h goes to 0. 
Exercise 1 It was tacitly assumed that 7; is linear. Prove it. 


Assume that each f; is differentiable, let T; = df;(a), and let T be the linear 

transformation with coordinate functions T;._ [Thatis, Tx = (Tix, ..., 

Tnx); prove that T is linear.] If ¢ > 0 is given, we can find 6 > O such 

that if |A| < 6, then 

fila + h) — fila) — Tih 
[A 


<e for each j. 


If each coordinate of a vector is smaller than e, then the absolute value 
of the vector itself is smaller than V7. Therefore, if || < 6, then 


ee CE 


[Al 


Proof of Theorem 3.1 From Theorem 1.4 it follows that the partial derivatives of each coordinate 
function exist on a ball with center a and are continuous at a; and accord- 
ing to Theorem 3.2, we can deal with the coordinate functions separately. 
Therefore, we can suppose from now on that f is a function from R” to R}. 

In order to use the partial derivatives, we want to express the difference 
f(x) — f(a) as a sum of terms in each of which only one coordinate varies. 


230 


10/linear approximation 


If we put 


Ge (a1, se 6 9 Aj, Xp41) 2 2 + » Xm); 


then a° = x and a™ = a, and we have 


f@) —f@) = ) fle) — fe) (2) 


j=l 


because of the cancellation between each term and the next. Geometri- 
cally, in two dimensions we go froma to x along the path shown in Figure 1. 
We want to apply the mean-value theorem to the function 


g;(t) = fla, oe 6 9 Qy-1y by Xj41, - - - 28) 


in order to evaluate f(a!) — f(a’) = g;(x;) — g(a). If we can do so, we 
shall obtain 


fa) — f@) = g (Es — a) = Dif(P) Gi — a), (3) 
where £; is some point between a; and x;, and 
2] = (a1, » 2 + 5» Qj-1,; 5. Xjtly 2 + + 4 oo) 


This use of the mean-value theorem is justified if g; is differentiable 
on the open interval from a; to x; and continuous on the corresponding 
closed interval. Now, if x is a point of a ball B(a, r), then each a’ is also 
a point of the ball; for it is clear that |a7 — a| < |x — al. Moreover, a 
ball has the property that if it contains two points, then it contains the 
entire line segment joining them. Therefore, if B(a, r) is a ball on which 
the partial derivatives of f exist, then each of the line segments [a’~!, a’] 
belongs to B(a,r); so the partial derivatives of f exist on each such line 


Figure 1 


Remark 


Exercise 2 


Exercise 3 


4 


THEOREM 
4.1 


composite functions ooh 


segment—and the existence of D,f is (by definition) what is needed for 
the differentiability of g;. 
Formulas (2) and (3) with x = a + h give 


fa +h) ~ fla) =) DsflB)y if A <r. (4) 
Let _ 
Th =) Difle)hy (5) 


and let e > 0 be given. Use the continuity of the partial derivatives at 
the point a to find 6 < r such that if |y — a| < 6, then 


IDify) ~ Difl@)| <¢ for each j. 
If |A| < 6, this can be applied to y = & to give 
[Dif(#) — Djfla)|<¢ for each j. 
Subtracting formulas (4) and (5) and making use of this, we find 
Ifa + 8) — fla) ~ Thl < Vandal, 


which implies that f is differentiable with differential T. 


Differentiability does not imply continuity of the partial derivatives. It is 
somewhere between continuity of the partial derivatives and simple existence. 


Write out the proof of the theorem for m = 2. 


A function f:R'— R® is differentiable in the present sense if and only if it is 
differentiable in the sense of Chapter 8. If this is the case, then 


df(a)h = hf’ (a). 


COMPOSITE FUNCTIONS 


The formula for the differential of a composite function is absolutely fundamental. 


Let R’ = R” — R, If f is differentiable at a, and g is differentiable at 
b = f(a), then g of is differentiable at a, and 


d(g of)(a) = dg(b) © df(a). (1) 


In other words, the differential of the composite ts the composite of the differentials. 


232 10/linear approximation 


In proving the theorem it is handy to write the condition of differentiability 
in a slightly different, but obviously equivalent, form: f is differentiable at a 
with differential S$ if and only if 


flat h) = f(a) + Sh |hle(A), where ¢(h) > 0ash— 0. (2) 
Proof of the Theorem If Sis the differential of f at a, and T is the differential of g at 6, then we 
have formula (2); similarly, 
e(b + &) = g(b) + Te+ lkln(B), where n(k) > 0ask— 0. (3) 
Setting k = f(a + A) — f(a), we get 


gof(a+h) — gof(a) = go +k) — g(b) = Tk + lain) 
T(Sh + |hle(A)) + |kln(z) 
To Sh + |hlTe(h) + |kln(k). (4) 


In accordance with the interpretation of differentiability by formula (2), 
what has to be proved is that 


|k|n(k) 


Te(h) ~ 0 as h— 0, and iA 


— 0ash— 0. (5) 


The first part is clear, for 
|Te(h)| < TI le(){ and c(h) >. 
As for the second, we have k = Sh + |hle(h), so that 
Al < JAlCISI| + le@)D, 
which shows first that k > 0 as h— 0, and then that 


k|n(k) 


AS LSI + le@)Dn@)| = 0. 


Since the matrix of the differential is made up of the partial derivatives of 
the coordinate functions, Theorem 4.1 includes the formula for the partial 
derivatives of a composite function. Using Theorem 8.1 of Chapter 9 on the 
matrix of a product of linear transformations, we get 


WTO EN Let R! AR SR. If f 1s differentiable at a, and g is differentiable at 
4.2 
b = f(a), then 


Dy(g ef)i = > (Digi) (Difx); (6) 
pea 


where the partial derivatives of f and g of are calculated at a, and those of g 
are calculated at b. 


Exercise 1 


THEOREM 
4.3 


Example 


THEOREM 
4.4 


Exercise 2 


composite functions 233 


The theorem is obviously impossible to remember, but in the right notation 
it becomes easy. With y = f(x) and z = g(y) it becomes 


des _ Sf aes By 
Ox; OY i Ox; 


k=1 


The notation is arranged so that if we think of the things that look like fractions 
as fractions, then the tops and bottoms cancel out nicely. 


Prove the following theorem: 


Let R!-> R” Ee R". If @ is differentiable at a, and f is differentiable at 
b = g(a), then (f o ¢)’ = (Vf, ¢’). 


One way to define an “(mn — 1)-dimensional surface” in R* is by an equation 
f(x) = 0, where f:R"—> R!. For instance, f(x, y) = 0 defines a curve in the 
plane, f(x, y, z) = 0 a two-dimensional surface in R3, and so on. There are 
sticky problems in this, but let us play with the idea a little. 

To say that a path g:R!— R’ lies on the surface means that f(y(t)) = 0 
for each ¢; thatis, that fog = 0. If this is so, then Theorem 4.3 says that gy’ (a) 
is perpendicular to Vf(b). But ¢’(a) is the tangent vector to the path at b. 


Let f:R"— R! be differentiable at the point b on the surface f(x) = 0. 
The tangent vector to any path on the surface at the point b is perpendicular 


to Vf(b). 


This makes it very tempting to define the tangent subspace to the surface 
to be the orthogonal complement of V/(b)—which, as we know, is an (n — 1)- 
dimensional subspace of R” as long as Vf(b) # 0. Only one thing is lacking, 
and that is to know that every vector in Vf(d)+ is tangent to some path on the 
surface atb. Then the tangent subspace would have a pretty description as the 
set of all tangent vectors to paths on the surface at b. This is absolutely all 
right (as long as Vf(b) # 0), but it will require the implicit-function theorem of 
Section 8 to prove it. 


Prove the result just discussed in the case that f(x) = xn — g(x1, . . . , Xn-1)- 
[This would mean in the plane, for instance, that the curve is given by an 
equation y = g(x), instead of f(x, y) = 0; and in the three-space that the surface 
is given by an equation z = g(x,y), instead of f(x, y, z) = 0.] Obviously, this 
is already an important and interesting case. 


234 1o/linear approximation 


5 THE MEAN-VALUE THEOREM 


Although the mean-value theorem is false for functions with values in R”, there 
is nothing wrong with it for functions from R” to R}. 


THEOREM Let f:R™ — R! be continuous on the closed segment from a to x and differentiable 
5.1 on the open segment. Then there is a point £ between a and x such that 
f(x) — fla) = df(é)(@ — a) = (HF), x — a). (1) 
Proof Set h = x — a and g(t) = f(a + th). Then g = fog, where 


g(t) = a+ th; hence o'(@)e— ip 


Assuming there is no trouble with differentiability, we can apply the 
ordinary mean-value theorem to g to get 


oe a) 20) (0) eG) vite 7h), eG) 
= (Vf(E), A), 


and £ = a+ rh is between a and x = a + fh becauser is between 0 and 1. 


Exercise 1 Establish the differentiability needed in the above calculation and write out 
the proof in full. 


Exercise 2 If the partial derivatives of f:R™ — R” exist and are identically 0 on a connected 
open set G, then f is constant on G. 


DEFINITION A function f:R™ — R® ts continuously differentiable, or C', at a point a if it 
5.2 ts differentiable at every point near a, and the partial derivatives are con- 
tinuous ata. Itis C' on an open set if it ts C at each point of the open set. 


Exercise 3. Let {:R” — R* be differentiable on a set D, and think of df as a function from 
D into the space £nn of linear transformations. Show that f is C! at a if and 
only if D includes a ball with center a and df is continuous at a. 


») 


Remark Usage varies. In some books “‘C’ at a” means ‘“‘C! on a neighborhood of a.” 


THEOREM Let f:R™— R" be Cat a. For every € > 0 there is a 6 > O such that if 
5.3 |x — a| < 6 and|y — al < 4, then 


f(x) —fO) — dfla)(x — y)| S elx — yl. (2) 


Proof 


the mean-value theorem 235 
It is enough to prove the theorem for each coordinate function, so we can 
suppose that f:R™— R'. Lete > 0 be given. Since the partial deriva- 
tives are continuous at a, we can find 6 > 0 such that 


Vé(z) —Vf@|<e if |z—al <6. 


The mean-value theorem gives f(x) — f(y) = (Vf(§), x —y) with £ 
between x and y; therefore, | — a| < 6. Hence 


If) — f0) — 4f@& — »)| = |(WE) — VE@, x — y)| < ex — yl, 


by virtue of Cauchy—Schwarz. 


Exercise 4 Show that it is really enough to prove the theorem for each coordinate function. 


THEOREM 
5.4 


Proof of the First Half 


Exercise 5 


Exercise 6 


The following theorem is a good example of how simple properties of the 


differential reflect deep properties of the function itself. The first half can be 
proved easily now. The other will have to wait until Section 7. 


Let f:R™ — R® be Cat a. If df(a) is one to one, then f itself is one to one 
on some ball with center a. If df(a) is onto, then f maps each neighborhood of 
a onto a neighborhood of b = f(a); that is, for every r > O there is anr’ > 0 
such that 


f(B(a; r)) D Bb; 1’). 


According to Theorem 8.5 of Chapter 9, there is a number m > 0 such 
that 

|df(a)h| > mal. 
Combined with formula (2), this gives 


[ f(x) — fly)| = (m — &)|x — 9] if |x — al < 6 and |y — al < 6, (3) 


and we have only to take e < m. 


In the first half of the theorem it must be true that m < n, and in the second half 
that m > n. 


What does a function like the one in Figure 2 show about the necessity of the 
hypothesis that f is C' at a in Theorem 5.4? 


236 


1o/linear approximation 


O a 


Figure 2 


FIXED-POINT THEOREM 


A fixed point of a function F:X¥— X is a point x © X such that F(x) = x. 
There are two or three fundamental theorems that assert that fixed points exist 
under suitable conditions. One of them (the easy one) is proved here. It is 
not immediately apparent why such theorems are fundamental, but we shall 
give an application of obvious interest in Section 7. 


THEOREM 
6.1 


Proof 


Let X be a complete metric space, and let F: X — X have the property that 
d(F(x), F(y)) < Md(x,y) with M <1. (1) 
Then there 1s one and only one point x © X such that F(x) = x. 


It is apparent that there cannot be more than one fixed point, for if x and y 
are both fixed points, then formula (1) gives d(x, y) < Md(x, y) with 
M <1. The problem is to show that there is at least one. 

Let xo be any point of X. Set x1; = F(xo), x2 = F(x1), and in general 
xn = F(x,_1). If the sequence {x,} converges, say x, — x, then it follows 
that 

F (xn) — F(x) and Fn) = xXnyi— x. (2) 


Since a sequence can converge to at most one point, it must be true that 
F(x) = x, and the theorem is proved. 


Exercise 1 


Exercise 2 


the inverse-function theorem 237 


The first part of formula (2) depends on the fact that F is continuous. Indeed, 
F is uniformly continuous; given e > 0, one can take 6 = e. 


The problem that remains is to show that the sequence {x,} does converge. 
Notice that from property (1) it follows that 

d(x1, x2) = d(F (x0), F(x)) < Md(xo, x1), 

d(x2, x3) = d(F(x1), F(x2)) < Mad(x1, x2) < M*d(xo, 1), 

d(x, x4) io d( F(x), F(x3)) < Md (x2, x3) < M?d(xo, x1), 
and in general that 


d(xx, Xk) < M*d(xo, Ky). 


If n > m, then by the triangle inequality we have 
ii =il 


Hee D HEME Cin a (3) 
k 


k=m =m 


If m and n are large enough, the term on the right is as small as we please, 


for the series 
> Mt 
k=0 


converges, when M <1. This shows that the sequence {x,} is Cauchy, 
so it must converge, as X is complete. 


Show that the sum in (3) is equal to 


Mm — Mr 
Twi 
and hence that 
He ee (4) 
my — 1 fie M 2 


Formula (4) is interesting in numerical work. If the point x,,, which is 
obtained by explicit calculation, is considered as an approximation to the fixed 
point x, then formula (4) gives the error. 


THE INVERSE-FUNCTION THEOREM 


The inverse-function theorem is one of the prettiest and most important theorems 
of calculus. It says roughly that to decide whether a function f/:R" — R” has 
an inverse, you just look at the differential. If the linear function df(a) has 


238 


1o/linear approximation 


THEOREM 
7.1 


an inverse, then f itself has an inverse at least on a neighborhood of b = f(a). 
Equivalently, the equations f(x) = y have a unique solution for each y near 6 if 
the linear equations df(a)x = y do. 


(Inverse-Function Theorem) Let f:R"°— R" be Clata. If df(a) 
is invertible, then f itself is locally invertible in the sense that there ts a function 
¢ which is defined on a neighborhood of b = f(a), ts differentiable at b, and 
satisfies 

fe@e=I and gof =I. 


If f is C! on a neighborhood of a, then yg ts C' on a neighborhood of b. 


Before turning to the proof, let us discuss the theorem. The hypothesis 
that df(a) is invertible is plainly necessary, for if fo ¢ = J and gof = J, then 
the chain rule for composite functions shows that 


df(a) odg(b) = 1 and dy(b) o df(a) = I. 


Incidentally, this shows why we consider functions from R” to R* rather than 
functions from R” to R* with m ¥ n. A linear transformation from R™ to R* 
can never be both one to one and onto if m and n are different. (There is still 
something that can be said, though, as we shall see later.) 

It is natural to ask whether there is a reason for singling out a particular 
point @ and claiming only that f is locally invertible on a neighborhood of 
6 = f(a). Suppose that f is C! on an open set G and that df(a) is invertible 
for every a € G. Is it then true that f has an inverse that is defined on the 
whole set f{(G)? Equivalently, is it true that f is one to one on the whole set 
G? Indimension 1 we know that the answer is “‘yes”’ if G is an interval (why?); 
but in higher dimensions the answer is “‘no.”’ Consider the function f: R? > R? 


defined by 
f(z) = 2? complex multiplication. 


It is easily checked that 
df(a)h = 2ah, 


so df(a) is invertible for each a ¥ 0, that is, for each a in the open set 
G = R? — {0}. On the other hand, f is not one to one on G, for every com- 
plex number w ¥ 0 has two square roots—if one of them is z, then the other 
1S 

To look at things a little more closely, let H, and H_ be the open half- 
planes bounded by a line through 0. It is plain that if z lies in H,, then —z 
lies in H_, and therefore that f is one to one on both A, and H_ and that 


f(H4) = fH). 


the inverse-function theorem 239 


Figure 3 


Calling this set K, we have two local inverses: 
giiK— A, and pK — HBL (ep. = —¢,). 


To be a little more explicit, let a be a given point #0, and let b = f(a). 
Let H, be the half-space that contains a and is bounded by the line through 
0 that is perpendicular to a (Figure 3). Then f(H,) = R? — J, where / is the 
half-line determined by —)d, so g, is defined on R? — /. gy, is the continuous 
square-root function that is defined on this set and takes the value a at the 
point 4. Note that if w is close to —4 and on one side of /, then ¢.(w) is close 
to 7a; while if w is close to —b and on the other side of /, then ¢;(w) is close to 
—ia. This suggests (correctly) that there is no continuous square-root func- 
tion on a ring around the origin (Figure 4). Nevertheless, there is a continuous 


Figure 4 


240 


10/linear approximation 


Remark 


Figure 5 


square-root function on the set shown in Figure 5. Try to give a formula in 
terms of gy and ¢_. 

The question of what are the “biggest”? sets on which the local inverses 
are defined is a complicated one. The inverse-function theorem does not 
attempt to answer it, but merely maintains that the restriction of f to some 
neighborhood of a has an inverse that is defined on some neighborhood of 
6 = f(a) and has suitable differentiability properties. We shall see that even 
this weaker kind of result is very powerful. 


A few years ago some questions of this kind came up in my own research, and 
I thought it might be helpful to see what Jacobi himself had to say on the subject 
of Jacobians and the inverse-function theorem. I dug back into the dusty old 
pages, and found to my considerable disgust that the paper was written in 
Latin. However, I had fancied myself pretty good in high school Latin, and 
also I thought that the formulas should provide plenty of clues. Whether I 
was good in high school Latin, I am no longer at all sure. As for the formulas, 
neatly displayed in the center of each page was 


y =f) 


and no other. So I have no idea what Jacobi himself had to say on the subject 
of Jacobians and the inverse-function theorem (and I did not succeed in solving 
the problem). 


the inverse-function theorem 241 


y 


~ LAL 
a 
— 


t | 
WN 


w 
wes 
\ ws 
As 


OTHE GUT 


Karl Jacobi 


Now let us turn to the proof. 


It is convenient to begin with the special 
case in which df(az) = I. The calculations are a little easier, and they give 


more explicit information that is useful for other purposes (e.g., changes of 
variable in multiple integrals). 


This more explicit information is contained 
in the following theorem. 
THEOREM Let g:R" > R"* be C' at a, and let g(a) = band dg(a) = I. For every 
7.2 6,0 < ¢€ < 1, there isa 6 > O such that if r < 6, then 
B(b; (1 + €)r) D g(Bla;1r)) D B(b; (1 — )r). (1) 
Proof 


If € is given, recall that dg(a) = I, and choose 6 > 0 so that (Theorem 5.3) 
le) —26)-—@—ylSex—y| ifs,y9 BG; 8). 2) 

Then, in particular, 

GS oie) = 20) (1: | y S BG; 4). G3) 


The left side of this inequality shows that g is one to one on B(a; 8). 
And the right side, with y = a, shows that 


g(B(a; r)) C BO; (1 + ©), 
which is the first half of formula (1). 


To prove the second half of formula (1), we shall use the fixed-point 
theorem with 


X = Bia;r) and 


F(x) = —g(x) +x + z, 


242 


ro/linear approximation 


where z is any point of B(b; (1 — e)r). It is plain that 
Fa Vea, if and only if g(x) = z, 


so all we have to prove is that the fixed-point theorem is applicable— 
that is, that X is complete (and we do know that a closed ball is complete), 
and that 
F:B(a;1r) > Bla; 1) (4) 
and 
|F(x) — Fiy)| < Mx — y| with M < 1. (5) 


Formula (5) is the same as (2), with M = e, for z cancels in the difference. 
And formula (4) follows from (2), with y = a, for 


F(x) — a] = |-g@x) +b4+x%—-—a+z-)] 
<elx —al + [z-b] <r7+—-—e6r=r. 


Note that the strict inequality in the middle shows that F:B(a;1r) > 
B(a; r), so the fixed point x lies in the open ball, as required. 
Now let us do the differentiability of the inverse of g. The function 

h = g~‘is defined on the ball B(b; (1 — €)8), and if z and w are any two 
points of this ball, then formulas (2) and (3) [with x = A(z) andy = h(w)] 
give 

(S 
il =e 

if z, w © B(b; (1 — €)8). (6) 


This obviously implies that A is differentiable at 5. 


pee ie) — h(a) | = ele = 5) 


Jz — w| 


Exercise 1 Prove the last statement—and be careful that your proof does not show that h 
is differentiable at an arbitrary w © B(b; (1 — 66). 


Proof of Theorem 7.1 


Let T = df(a) and g = T~!cf. The composite-function theorem shows 
that dg(a) = J, so everything proved above is applicable. The function 
gy = ho T~'1is the one sought in the theorem. Indeed, 


fee= TeogohoTI1 = TofloTI =f, 
ep°f=heT'0Tog=holog=], 


and ¢ is differentiable because both fh and 7~—! are. 

It remains to prove the last part of the theorem to the effect that if 
fis C? not just at a, but on a neighborhood of a, then ¢ 1s C! on a neighbor- 
hood of 6. First of all, it is clear that ¢ is differentiable on a neighbor- 
hood of 6, because what has been proved can be applied at each point of 
a neighborhood of a. Now, look at df and dy as functions from R” to 


Exercise 2 


Example 


Exercise 3 


the inverse-function theorem 243 


the space £,, of linear transformations. 
composite function theorem says 


From this point of view the 


dp = Rodf og, (7) 


where ® is the inverse function; that is, RCL) = T~! for any invertible 
TE L&£nn. Now, ¢ is continuous because it is differentiable, df is con- 
tinuous by hypothesis, and @ is continuous by Theorem 8.8 of Chapter 9. 
Consequently, dg is continuous. 


Check formula (7). In chasing around composite functions it is convenient 
to make a diagram of the following sort: 


a 
Lenn —. Linn 


dt » Td 
R2 cs R2 
f 
The assertion of formula (7) is that the arrows can be followed around in 
either direction with the same result. 


Consider the function f:R? — R? given by 
x = 7rcos 8, 


y = rsin 6. (8) 


Ordinarily, these are interpreted geometrically as the equations between rec- 
tangular and polar coordinates in the same geometric plane, but they can also 
be interpreted as defining a function from R? to R? which fits into the present 


scheme. The Jacobi matrix is 
Ox Ox 
an d6)| {cos ? ~—risin@ 
dy oy sin 6 r cos 6 
or 00 


The determinant of this matrix is 7, so the differential is invertible if and only 
ifr 0. The inverse-function theorem gives the following: Take any (ro, 90) 
with ro # 0 and the corresponding (xo, yo). If (x, y) is close enough to (xo, yo), 
then the equations (8) have a unique solution 7 = g(x, y), 6 = W(x, y) which 
is close to (ro, 00), and the functions ¢ and yw are C! on a neighborhood of 


(x0, Yo) 


In this polar coordinate example, analyze the “‘global’’ existence of the local 
Pp pie, y: § 
inverse as was done in the example f(z) = z?. 


244 10/linear approximation 


Exercise 4 Discuss the equations x = u? — v?, y = wv, and also the equations x = u? + 0°, 
y = uv. 


Exercise 5 It is often convenient to study a surface 
hn Clkig se Sepa) 
by flattening it out with the transformation 
yo =x; fori<n and y= “60 (x), ee eae 


Show that the differential is invertible and calculate the Jacobi matrix. Dis- 
cuss the transformation. What happens to the surface? 


One part of the inverse function theorem is all right for functions from 
R” to R’ with m > n. The result was stated initially as Theorem 5.4. 


THEOREM Let f:R"™ — R* be C1 at a. If df(a) maps R” onto R®, then f itself maps 
7.3 every netghborhood of a onto a neighborhood of b = f(a). In fact, there 1s 
a function p:R" — R™ which is defined on a neighborhood of b, is differenti- 
able at b, and satisfies fo p = I. If f is C! on a neighborhood of a, then 
¢ is C! on a neighborhood of b. 


Proof Since the dimension of the range of a linear transformation is the number 
of linearly independent columns in its matrix, it follows that 0f/dx has n 
linearly independent columns. Suppose for simplicity of notation that 
they are the first n, and define g:R" — R* by 


g(x) = f(x, Ontl, + +> yan): 


Now apply the inverse-function theorem to g. 


Exercise 6 The function ¢ of the theorem is not precisely the local inverse of g. 
What is it? Is there just one function ¢ that satisfies the conditions of the 
theorem? Draw a picture. 


Exercise 7 It is tempting to conjecture a corresponding theorem on the existence of a 
“left inverse’ —a function ¢ such that gof = J. State the conjecture, and 
disprove it by using a figure six. 


Exercise 8 Let f:R”™— R” be C’ on an open set G. If at each point of G, df maps R” 
onto R’, then f(G) is open in R*. 


8 


DEFINITION 
8.1 


THEOREM 
8.2 


the implicit-function theorem 245 


THE IMPLICIT-FUNCTION THEOREM 


The inverse-function theorem involves the solution of equations y = f(x), or 
equivalently y — f(x) = 0. The implicit-function theorem involves the solu- 
tion of equations that appear more general at first, that is, /(x, y) = 0, but 
reduce quite easily to the former. Here x © R™ and y € R’, and the point 
is to solve for y when x is given. In order to have a decent expectation of a 
unique solution, there should be as many equations as unknowns, so F should 
be a function from R™*” to R*._ If the equation F(x, y) = 0 is written out in 
full in terms of all the coordinates, it looks like this: 


Er: baa oe tree ey eo 
F(x1, sos sy Xm, Vy s+ - > Jn) a 

(1) 
GAC ss 9 Mm, Vly 2 ss > Jn) = 0 


which shows mainly that there is good reason for writing F(x, y) = 0 instead, 
and for introducing some suitable additional notation, 


If P:R" — R®, then d,F is the differential of the function from R” to R” 
obtained by fixing x, and OF /dy is the corresponding Jacobi matrix. Thus, 


oF ar, 

a a 

ay |aF, OF al. 
3), dyn 


(Implicit-Function Theorem) Let F:R™*"—» R" be C! at a point 
(a, b) with F(a, b) = 0. If d,F(a, 6) is invertible, there are positive 
numbers € and 6 such that 
(a) If |x — al < 6, then there is one and only one point y = ¢(x) 
satisfying 
ly— bl <e aid elon y e— 0. 


(b) The function ¢ is differentiable at a and satisfies 


oF | oF ay aF: Be Oce 
eac oe ; O¢% 
ax | ay ax ie 


= 0, 2 
OVE Ox; ( ) 
k=1 : 
(c) If F is C! on a neighborhood of (a, b), then y is C! on a neigh- 
borhood of a. 


246 


10/linear approximation 


Exercise 1 


Proof 


Exercise 2 


The partial derivatives of y are evaluated at a, those of F at (a, 6). For- 


mula (2) results from differentiating the equation F(x, o(x)) = 0. Note that 
this formula gives the means to calculate the partial derivatives of y, since the 
matrix OF /dy is invertible. 


Verify formula (2). 


The theorem is proved by using the inverse-function theorem on the 
function /:R™t" — R™* defined by 


fy) = (%, FG, 9). (3) 


The equation F(x, y) = 0 is equivalent to the equation f(x, y) = (x, 0). 
Assume for the moment that df(a, 6) is invertible, so f has a local inverse 
W defined on some neighborhood of (a, 0). If x is near a, then there is 
one and only one solution of the equation f(x, y) = (x, 0) which is near 
(a, 6), namely W(x, 0). In other words, if J and P are the functions 


OPV | ue ana 0), 
P:R — R*, P(x, y) = 9, 
then 
y= PopoJ (4) 


is the function sought in the theorem. Since J and P are undoubtedly 
differentiable—and by Theorem 7.1, ¥ is, too—it follows that ¢ is differ- 
entiable. So all depends upon showing that df(a, 6) is invertible. 

The matrix of df(a, 5) is 


ZO 
OF OF |, 
Ox dy 


where J stands for the m by m identity matrix, and 0 stands for the m by 
n0Q matrix. Therefore, 


df(a, b)(h, k) = (h, deFh + dyFR). 


If follows that if df(a, b)(A, k) = 0, then first A = 0, and then d,Fk = 0. 
But then also & = 0, since d,F is one to one. Thus, df(a, 5) is invertible. 


There are es and 6’s in the statement of Theorem 8.2, but not in the proof. 
Write out the proof in enough detail to produce them. 


The following version of the implicit-function theorem can be regarded 


as an improvement of Theorem 7.3. 


THEOREM 
8.3 


Exercise 3 


Example 


the implicit-function theorem 247 


Let F:R™*" — R® be C! at (a, 6), and let F(a, b) = c. Lf d,F(a, 6) 
is invertible, then there are positive numbers € and 6 such that 

(a) If |x ~ al < 6 and |z — c| < 4, then there is one and only one 
point y = ox, z) satisfying 


ly— bl <e ne TAGE ee 
(b) The function ¢ is differentiable at (a, b) and satisfies 


OF dF 0¢ OF d¢ 
a ee 
Ox dy Ox Oy Oz 
(c) If F is C' on a neighborhood of (a, b), then v is C! on a neigh- 
borhood of (a, c). 


The theorem is proved by applying the implicit-function theorem to the 
function G:R™*"*" —5 R* defined by 


Cay, 2) = F(x, y) — 4, 


The point (x, z) plays the role that x played before and (a, c) the role that a 
played. 


Instead of deducing Theorem 8.3 from the implicit-function theorem, prove 
it directly by going back over the proof of the implicit-function theorem. 


How is Theorem 8.3 an improvement on Theorem 7.3? It shows not 
only that # maps every neighborhood of (a, 6) onto a neighborhood of c, but 
also that the points that map onto a given point z form an ‘‘m-dimensional 
surface” given by an equation y = g(x, z) and that the surface varies differ- 
entiably as z varies. [This refers, however, not to all the points that map 
onto z, but to those in a neighborhood of (a, b).| 

What about the hypotheses? In Theorem 7.3 the hypothesis is that dF 
maps R™*™ onto R*, while in Theorem 8.3 it is that d,F is invertible. The 
two look different, but the difference is more one of appearance than of sub- 
stance. If dF maps R™*" onto R", then its matrix must have n linearly inde- 
pendent columns. If the coordinates are relabeled so that these columns 
correspond to the y’s and the others to the x’s, then the two hypotheses are 
exactly the same. 


Let F(x, y) = x? + y* — 1, so that F(x, y) = 0 is the equation of the unit circle 
in the plane. We have 


248 


1o/linear approximation 


Exercise 4 


Exercise 5 


and, consequently, 


dzF(a, b)h = 2ah, dy F(a, b)k = 2bk, — dF(a, b)(h, k) = 2ah + 2bk. 


The implicit-function theorem says that the equation x? + y? = 1 can be 
solved for y as a function of x in a neighborhood of any point (a, 6) with 
a®+ 6? =1andb #0. A glance at the graphs shows that it cannot be solved 
for y in any neighborhood of either of the points (+1, 0). The equation can 
be solved for x in a neighborhood of any (a, 6) with a2?-+ 5? = 1 and a # 0. 

From the point of view of Theorem 8.3, we see that the points which satisfy 
F(x, y) = z form a nice curve—the circle with center 0 and radius 1 + z—as 
long as z > —1. For z < ~—1 there are no such points, and for z = —1 
there is just the one point (0, 0), which is the one point at which dF does not 
map R? onto R!. 


Discuss the solution of the equation xy? — 2y + 1 = 0. Draw a picture of the 
set of points in the plane that satisfy the equation. 


Discuss the solution of the equation y? — x* = 0 and draw the picture. Notice 
that near the point (0, 0) the equation can be solved for x but not for y. The 
solution for x is not differentiable. At this point both d,F and d,F are 0. 


oe) 


11 : Surfaces 


1 


ALGEBRAIC CURVES 


A (real) algebraic curve is the set of points in the plane satisfying an equation 
F(x, y) = 0, where F is a polynomial: 
n 
Fey) =) any. (1) 
j,k=0 
It is a good exercise to try to get some information about such curves by using 
the implicit-function theorem and some of the other basic results of calculus. 
There are two kinds of points x = a that cause mischief. The first is 
obvious from the implicit-function theorem. If the equations 


INC 


F(a, y) = 0 and 
oy 


(2) 
have a common solution y = 5, then on the one hand the point (a, 6) is on 
the curve, but on the other we cannot expect to be able to solve for y as a 
function of x. 

The second kind is less obvious. In formula (1) collect together all the 
terms with a given power of y and write 

F(x, y) = y Ax(x)y*. (3) 
k=0 

Each A, is then a polynomial in x. The second points that cause mischief are 
the points a such that 


A.(a) — 0) : (4) 


that is, the points where the coefficient of the highest power of y vanishes. 
The reason these must be avoided will be apparent in the proofs. 


250 


11/ surfaces 


THEOREM 
1.1 


Proof 


THEOREM 
1.2 


THEOREM 
eS) 


In order to start out, we shall need two theorems from algebra to show 
that these troublemakers are only finite in number. 


The polynomial 


p(x) = a;x?, an * 0, (5) 
a 


does not vanish at more than n points. 


For any a we can write 
n 


p(x) = Y a(x — a)! (6) 
j=0 
simply by putting x = (« — a) + a in the original formula and multi- 
plying out. (Taylor’s formula for polynomials!) It is clear that 
a = p(a); so if p(a) = 0, then each term in the sum in (6) contains 
x a) and 


p(x) = & = adg(x) = (x = @) Y a(x = a). 
j=l 
We can use induction on the number n, which is called the degree of p. 
Since g has degree n — 1, it cannot vanish at more than n — 1 points, 
and / cannot vanish except at these and at a. 


This takes care of the points that satsify equation (4). The other theorem 
from algebra is not so easy, and we shall simply assume it. The polynomial 
F(x, y) is said to have a square factor if there exist polynomials G and H such 
that F = G?H. If this is the case, then F} = GH vanishes at exactly the same 
set of points, so in studying algebraic curves we can assume that F has no square 
factor. ‘The second theorem from algebra is the following: 


If the polynomial F(x, y) has no square factor, then there exist polynomials 
A(x, y), B(x, y), and R(x) such that 


oF 
SS EGE eer EL (7) 
y 


If the polynomial F(x, y) has no square factor, then there are only a finite 
number of points a such that the equations 
OF 
F(a, y) = 0 and a (a, y) = 0 (8) 
), 


have a common solution. 


Proof 


Exercise 1 


Exercise 2 


THEOREM 
1.4 


Proof 


LEMMA 
1.5 


Proof 


algebraic curves 251 


If the equations (8) do have a common solution, then by (7) we have 
R(a) = 0, and by Theorem 1.1 this can happen for only a finite number 
of a. 


Show that both theorems are false if F has a square factor. 


Calculate the A, B, and R for the polynomials x? + y? — 1, xy? — 2y + 1, and 
y? — x’, which appear in the exercises at the end of the last section. 


Henceforth, we shall assume that F contains no square factor and will let 


N be the finite set of points a such that either A,(a) = 0 or the equations 
F(a, y) = 0 and dF (a, y)/dy = 0 have acommon solution. The main theorem 
that can be proved is as follows. 


Let I = (a, B) be any open interval that contains no point of the finite set N. 
If for some a EI the equation F(a,y) = 0 has exactly k distinct solutions,. 
then for every x EI the equation F(x, y) = 0 has exactly k distinct 
solutions, and there are functions gi, . . . , ge on I such that 

(a) F(x, y) = 0 af and only if y = ¢,(x) for some j. 

(b) Each ¢; is C' on I. 

(c) 9; has a limit at « and at B, provided + © are allowed. 


In two respects the theorem provides more precise information than the 


implicit-function theorem. It shows that the local solutions given by the latter 
actually exist on the whole interval J and that the limits exist at the end points. 
A typical picture of an algebraic curve would be as in Figure 1. 


If b is one of the solutions of F(a, y) = 0, then since a € J, it follows that 
dF (a, b)/dy ~ 0; the implicit-function theorem provides a function ¢ that is 
C' on an interval (a — 6, a + 6) and satisfies g(a) = b and F(x, o(x)) = 0. 

Let d be the upper bound of the numbers d’ such that there is-a 
function y that is C! on (a — 4, d’) and satisfies (a) = b and F(x, ¥(x)) = 0. 


Any two continuous functions , and 2 on an interval [a, d’) which satisfy 
the conditions 


W(a)= 6b and F(x, (x)) = 0 


must be identical. 


Since ¥; and yz are continuous, the subset of [a, d’) on which ¥; = yo isa 
closed subset. By the uniqueness in the implicit-function theorem, it is 
also an open subset—hence, the whole interval, as an interval, is connected. 


By Lemma 1.5 there is a unique function, which we shall call y again, 


which is continuous on (a — 6, d) and satisfies g(a) = b and F(x, g(x)) = 0, 


4 


252 11/ surfaces 


LEMMA 
1.6 


Proof 


LEMMA 
Noll 


Proof 


Figure 1 


and this g is automatically C!. It is all right to call the new function ¢, because 
Lemma 1.5 shows that it coincides with the original g on (a — 6,a + 6). We 
have to show, of course, that d = 8, but some other steps come first. 


g has a limit at d. 


Let /; be the limit inferior of g(x) as x approaches d from the left, and let 
lg be the limit superior. If the two are different, let / be any number 
between them. There must be points x’ as close to d as we please with 
g(x’) < J, and points x’’ as close to d as we please with g(x’) > /, and 
therefore points x as close to d as we please with g(x) = /. But this 
implies that the equation F(x, 7) = 0 has infinitely many solutions, which 
is impossible by Theorem 1.1. 


If d < B, then the limit of ¢ at d must be finite. 


Here is where the fact that the leading coefficient A,(d) does not vanish 
comes in. Suppose that p(y) = 0, where 


nO) = y At. (9) 


Exercise 3 


Remark 


mantfolds 253 


Then 


so if |y| > 1, then 


n—-1 
hz va > Au. (10) 
k=0 


Now, y = ¢(x) does satisfy the equation p(y) = 0, and each A, is 
bounded on some interval with center d, while A, is bounded away from 0. 
Hence (10) gives a bound for g(x). 


Now we can show that d= 8. If not, simply let / be the limit of 9(x) 
as x approaches d and apply the implicit-function theorem at the point (d, /) 
to extend ¢ beyond d. 

At this point the theorem is effectively proved. The same argument to 
the left of a instead of to the right pushes the interval on which ¢ is defined out 
to the whole interval J. Using each of the solutions bi, . . . , b, of F(a, y) = 0, 
we get solutions ¢i, . . . , g, which are C' on J, and, according to Lemma 1.6, 
have limits (possibly +) at a and 8. If at some other point ¢ there were 
additional solutions to F(¢, y) = 0, then we could start at ¢ and perform the 
same construction to obtain an additional solution at a. (Note Lemma 1.5, 
which says that no two solutions can be equal at a single point unless they are 
identical!) 

This proof is an example of a fundamental process called analytic continuation. 


Go through the exercises at the end of the last section again in the light of the 
general theorem. Find the points in NV, examine their significance, and so on. 


The results of this section, which depend partly on the unproved algebraic 
Theorem 1.2, are included for their intrinsic interest. They are not results 
that will be needed in the sequel. 


MANIFOLDS 


In the next few sections we shall study various features of smooth surfaces in R". 
There are several ways to define such surfaces. A curve in the plane (one- 


254 


11/ surfaces 


Exercise 1 


Exercise 2 


dimensional surface in R?) has appeared in the following guises: 


(a) S= {(x,y)ty = f(x)}. 
(b) S = {(x, y):F(@, y) = 0}. 
(c) S = g(Z), where ¢:I— R?. 


In (a), S is the graph of the function f. In (b), Sis perhaps an algebraic 
curve of the kind discussed in Section 1. In (c), S is a parametric curve or a 
path. 

Consider the analogs in R*. 


(a) S= {(x, y, 2):2 = f(x, y)}- 
(b) S = {(x, y, 2): FG, y, z) = O}. 
(c) S = ¢(J), where ¢:I— R’*. 


In (a), S is the graph of the function f. It is not a curve, but a two- 
dimensional surface. For instance, the equations 


Za 2 Sy a and aN 


define respectively a plane and the top half of the unit sphere. In (b), S is 
again a two-dimensional surface. For instance, the equation x? + y? + 2? — 
1 = 0 defines the unit sphere. In (c), however, if J is an interval, then S is a 
parametric curve or path. The way to get a two-dimensional surface out of (c) 
is to take J to be a square (or disk or some other two-dimensional figure). 
The way to get a curve out of (b) is to intersect a pair of surfaces, that is, to 
take a pair of equations F(x, y, z) = 0 and G(x, y, z) = 0—or, equivalently, to 
take a single equation F(x) = 0, where F is a function from R? to R?. 


What is the curve in R? described by the pair of equations 2x — 3y — z+ 1 =0 
and x? + y? + z?— 1 = 0? 


The way to get a curve out of (a) is also to intersect a pair of surfaces, but 
in this case the surfaces are usually taken to be special ones: 


S = {(,y, 2):y = f(x) and z = g(x)}. (a’) 
What is special about these surfaces? 


We are going to use the n-dimensional analog of (b) for the basic definition 
of a surface in R*, and then, of course, there will be a basic problem of showing 
how to pass back and forth between this and the analogs of (a) and (c). The 
analog of (b) is to define an m-dimensional surface in R® to be a set of points 
satisfying n — m equations F;(x) = 0, or, equivalently, a single equation 
F(x) = 0, where F is a function from R* to Rv. In order to obtain a set 


Exercise 3 


DEFINITION 
2.1 


DEFINITION 
2.2 


Example 


manifolds 255 


with some resemblance to an intuitive m-dimensional surface, we shall have to 
put some restrictions on F, for we have the following result. 


If S'is any closed set whatever in R®, let F(x) = d(x, S) be the distance from x 
to S. Show that F:R”— R! is continuous and that S = {x:F(x) = 0}. 


It is even possible to modify F so that it is of class C! on R® (in fact of class 
C*, although we have not defined this yet). (See Exercise 18.) According to 
the tentative definition above, this would mean that every closed set in R” is a 
surface of dimension n — 1, which certainly is not desirable. What we shall do 
is impose a condition on F that makes the surface smooth and of the right 
dimension. 


A point a © R? isa regular point of a function f:R? — R¢ if f is of class C 
on a neighborhood of a and df(a) has the maximum possible rank. If p < q, 
this means that the rank is p, and hence that df(a) is one to one. If p > q, 
this means that the rank is q, and hence that df(a) maps R® onto R4. 


A smooth surface or smooth manifold of dimension m in R” is a set M with the 
following property: For each point a € M there is a function F:R™ > Re-™ 
which is regular on an open set G containing a and is such that 


NGG — Oar eG. 


Within the open set G, M is exactly the set of points satisfying the equation 
F(x) = 0. Ordinarily we shall abbreviate this by saying simply that M = 
{x: F(x) = 0} on a neighborhood of a. 


If f:R™ > R*™ is of class C' on an open set G C R”, then the graph of f is a 
smooth m-dimensional manifold in R”. 


In discussing the example we shall introduce some notation that will be 
standard through the next few sections. If x is a point of R”, then x will denote 
the first m coordinates of x. Ify is a point of R"-™, the coordinates of y will be 
written (ymy1, .. + 5 Jn), rather than (yi, . . . ,yn-m). In this notation the 
graph of f is the set 


M = {x C Rx; = fi(%), 7 > m, 8 € G}. 


The notation allows the use of matching indices on both sides of the equation 
x: = fi(¥) and makes it easier to keep track of the indices. 

To show that the graph of f is a smooth m-dimensional manifold, let 
F(x) = x; — f(z), and let F:R" > R*™-™ be the function with coordinate func- 


256 


11/surfaces 


THEOREM 
2.3 


Exercise 4 


THEOREM 
2.4 


Proof 


COROLLARY 
2.5 


Proof 


tions F;,2 > m. It is plain that 
MiSs) — Ox eG, 


.so the problem is to show that F is regular at each point. The Jacobi matrix 


of F looks like this: 


Ge) 


where J is the (n — m) by (n — m) identity matrix. The last n — m columns 
are plainly independent, so dF has rank n — m and F is regular. 


Every smooth m-dimensional manifold M in R® 1s locally the graph of a C 
function. More precisely, if a is a point of M, then with a suitable relabeling 
of the coordinates there is a C} function f:R™ — R”-™ on a neighborhood of 
a such that 


ie — on a neighborhood of a. 
Prove the theorem by using the implicit-function theorem. 


Theorem 2.3 provides a local parametric representation of a smooth 
manifold. 


Let a be a point of a smooth m-dimensional manifold M in R*. There is a 
function p:R™ — R® which is regular on an open set G in R™ and such that 
¢(G) isa neighborhood of ainM. There ts an inverse function P:R” > R™ 
which is C! on a neighborhood of a in R” and satisfies Pog = I ona 
neighborhood of Pa in R™, and go P = I on a neighborhood of a in M. 


Such a function ¢ is called a local parametric representation of M ata. Note 
that 9(G) is not a neighborhood of a in R’, but rather there is such a neighbor- 
hood G, such that ¢(G) = M(\G,. The inverse function P is defined on the 
full neighborhood G,, and is C’ there, but it is the restriction of P to M that is the 
inverse of ¢. 


Use Theorem 2.3 to write M as the graph of a function / in a neighborhood 
of a, and set g(t) = (t, f(f)). It is immediately checked that this does the 
job and that the inverse function is just the projection Px = x. 


The dimension of a smooth manifold is uniquely determined—a given set M 
cannot be a smooth manifold of two different dimensions. 


Suppose that M C R® is a smooth manifold of dimension m and also of 
dimension k. Use the theorem to find local parametric representations 


Exercise 5 


Exercise 6 


THEOREM 
2.6 


Proof 


manifolds 257 


g:R”— R* and y:R*— R*. Let P and Q be the inverse functions, and 
set f = Qoy and g = Poy. Then fog = 1 on a neighborhood of Qa 
and ge f = I on a neighborhood of Pa. The chain rule gives the same 
equations for the differentials and shows that df(Pa):R™ — R* is both one 
to one and onto, which implies that m = k. 


Draw a picture to illustrate the proof. 


Let M C R* C R’. Show that if M is a smooth m-dimensional manifold in R’, 
then it is also a smooth m-dimensional manifold in R’. (This was tacitly used 
in the proof of Corollary 2.5.) 


Let M be a smooth m-dimensional manifold in R". Let ~:R™—> M be 
regular at the point to. Then Wis a local parametric representation of M at 
the point a = W(to). 


The theorem says that if G is any open set containing to, then ¥/(G) is a 
neighborhood of a = W(t») in M, and there is a C! inverse function Q on some 
neighborhood of a in R*. Before turning to the proof, consider the example of 
a figure sixin the plane. It is obvious that there is a one-to-one regular function 
¥:(0, 1) > R? that traces out the figure six. It is equally obvious that the 
inverse function is not even continuous at the point where the six comes together. 
Consequently, the theorem shows that the figure six is not a smooth one- 
dimensional manifold, or even contained in one. 


Use Theorem 2.3 to write M as the graph of a function f in aneighborhood 
of a, and let P be the projection on the first m coordinates. The first step 
is to show that the function g = Poy is regular at fo. Since P is linear, 
we have 
dg(to) = P db(to) = dP (to). 

Since (£) lies on the graph of f, we have y(t) = f;(¥()) for i > m; hence 
dWi{to) = df(a)dP(to). From this it follows that if d)(to)h = 0, then 
df (to)h = 0, which is possible only if h = 0, because y is regular. Thus, 
g is regular. 

According to the inverse-function theorem, g has a local inverse f that 
is defined on a neighborhood of Pa. The function we are looking for to 


invert y is just Q = hoP: 
TEX 


ee Rie 
ee 


258 


11/ surfaces 


Q is plainly of class C! on a neighborhood of a and satisfies 
Qoy= hoPoy= hog=I 


on a neighborhood of tp. If ¢ is the local parametric representation of M 
coming from the graph [that is, g(#) = (¢, f(¢)) as in the proof of Theorem 
2.4], then P is the inverse function to g; so we have po g = po Pop =y, 
hence 

yoQ= pogoQ = pogohoP= I 


on a neighborhood of a in M. 


Exercise 7 Does the last equation imply that if G is open, then ¥(G) is a neighborhood of 


THEOREM 
2.7 


Proof 


ain MP 


Let g:R™ > R*, m <n, be regular at the point to. If r is small enough, 
then M = ¢(B(to, r)) is a@ smooth m-dimensional manifold in R", and ¢ 
is a local parametric representation. 


Let a = g(to). The idea is to relabel the x coordinates so as to be able to 
solve the equations 


0 = Gilx, ) = x: — ¢.(2), | eee af 


for Xm41, - + + 5 Xn) f1, . - . 5 tm on a neighborhood of (a, to). If we can 
find solutions 


xi = fiz) fori > m, t = g(%), 


then we shall have expressed M as the graph of f in a neighborhood of a, 
and so will know that it is a smooth m-dimensional manifold. 
The Jacobi matrix of the function G:R"*™ — R” is the matrix 


where J is the n by n identity matrix. This matrix has rank 2, and the 
regularity of g means that the last m columns are linearly independent. 
Therefore, by rearranging the first 2 columns, we can ensure that the last 
n columns are linearly independent. Now use the implicit-function 
theorem to solve. There are positive numbers e and 6 such that if 
|z — a| < 4, then there is one and only one point (x, é), x: = fi(x) for 
i > mand ¢ = g(3), satisfying G(x, t) = 0, |x — al < e, and | — to| <e. 


Exercise 8 Check that ifr is sufficiently small, then y(B(to, r)) is the graph of f in a neighbor- 


hood of a. (Note that the example of the figure six shows that the restriction to 
small r is absolutely necessary.) 


Remark 


Exercise 9 


Exercise 10 


Exercise 11 


Exercise 12 


manifolds 259 


Knowing that M is a smooth m-dimensional manifold, we can conclude 
from Theorem 2.6 that ¢ is a local parametric representation. 


In the theorems of this section we have used the term smooth manifold in 
preference to the term smooth surface, and we shall continue to do so. The 
term smooth manifold is more in current favor, and also we shall want to use the 
term surface (though never smooth surface) in a slightly vague and nontechnical 
way. We shall speak of the surface F = 0 or the parametric surface ¢ in cases 
where we do not assume a priori that F or gis regular. In such cases we shall 
have to realize that the surface may not be a surface at all from the intuitive 
point of view—but still the language is convenient. Moreover, it is usually 
true that F or ¢ is regular at ‘“‘most” points, so if we remove a “‘small’’ set from 
the surface, then what is left is actually a smooth manifold. In the case of an 
algebraic curve, for example, the results of the last section show that if we 
remove a finite number of points, then what is left is a smooth one-dimensional 
manifold. 


Draw a picture of the surface in R’ defined by z? = x? + y?. Show that if the 
origin is removed, then what is left is a smooth two-dimensional manifold. 


The surface in R* defined by x? — y? = 0 consists of two planes whose inter- 
section is the z axis. If the z axis is removed, then what is left is a smooth 
two-dimensional manifold. 


Discuss the ‘curve in R? that is defined in polar coordinates by the equation 
r = sin 26. Draw a picture. Show that the same curve is defined by the 
equation 0 = F(x, y) = (x? + y”)3 — 4x%y2, Show that F is regular at every 
point except 0, and hence that if 0 is removed, then what is left is a smooth 
one-dimensional manifold. Show that the same curve is defined parametrically 
by the equations 


x = 2 sin 6 cos? 6, y = 2 sin? 6 cos @, 


Which points are regular for the corresponding function gy? Discuss this in the 
light of Theorem 2.7. 


Show that the torus (= hollow doughnut) obtained by revolving the circle 
x? + (y — 2)? = 1 around the x axis has the equation 


0 = F(x, y, 2) = (x? + y? + 2? + 3)? — 16(y? + 2”). 


Show that F is regular at each point of the torus, and hence that the torus is a 
two-dimensional manifold. 


260 11 /surfaces 


Exercise 13 


Exercise 14 


THEOREM 
2.8 


Exercise 15 


Exercise 16 


THEOREM 
2.9 


Exercise 17 


Exercise 18 


If M is a smooth manifold, then each point a € M has a neighborhood in M 
thatisconnected. [Hint: Theorems 2.4 and 2.7 show that each point has a neigh- 
borhood of the form ¢(B(to, r)), where ¢ is a local parametric representation.] 


A compact smooth manifold has only a finite number of connected components. 
(Hint: If there is an infinite number of components, pick a sequence {xx} in 
distinct components and apply Exercise 13 at a point a which is a limit of some 
subsequence.) 


The two extreme cases—zero-dimensional manifolds in R” and n-dimen- 
sional manifolds in R"—are easy to describe. 


A set M C R® is a smooth zero-dimensional manifold if and only if M is 
isolated—that 1s, for each point a © M there is an r > O such that MC\ 


B(a;r) = {a}. 
Prove the theorem by using the inverse-function theorem. 


A compact smooth zero-dimensional manifold consists of a finite number of 
points. (Hint: Use Exercise 14 and Theorem 2.8.) 


A set M C Ris a smooth n-dimensional manifold if and only if M is open 
in R". 


Prove the theorem. (You should interpret R®° as a zero-dimensional space; 
that is, R° = {0}. In this case, the function F in Definition 2.2 is the function 
identically 0.) 


In Section 3 we shall also describe the connected one-dimensional manifolds. 


Every closed set in R” is the set of zeros of a C” function. (Hint: (a) If B is an 
open ball, then there is a nonnegative C’ function f such that B = {x: f(x) > OF 
(b) Every open set in R* is the union of a sequence of open balls. (c) Now let 
F be the given closed set, write the complement as the union of a sequence of 
open balls B,, and choose f; as in (a). If M; is the maximum of f, and its 
first derivatives, and a, = 1/2*M,, then 


i = Lo fe 


has the desired property. [You can also use cubes instead of balls, in which 
case it is enough to do (a) in dimension 1 and then take a product of the resulting 
functions. ].) 


3 


DEFINITION 
3.1 


Exercise 1 
Exercise 2 


THEOREM 
3.2 


Proof 


Exercise 3 


THEOREM 
3.3 


tangent spaces 261 


TANGENT SPACES 


Let M be a smooth m-dimensional mantfold in R” given by an equation F = 0 
in a neighborhood of a pointa. If a is a regular point of F, then the tangent 
space to M at the point a, written Ta(M), ts the null space of dF(a). The 
tangent plane 1s the parallel plane through a. 


According to Corollary 2.5, F must be a function from R"™ to R*™-". Since 
Fis regular, the dimension of the range of dF(a) must be n — m; therefore, the 
dimension of the null space must be m. Thus, the tangent space is a subspace 
of the same dimension m as the manifold. It is not immediately apparent that 
the definition makes sense, however, for a given manifold can always be described 
by many different equations. 


If M is described by F = 0 ina neighborhood of a, and G = ZF’, then M is also 
described by G = 0. In this case, however, G is not regular at any point of M, 
so it is of no use in Definition 3.1. 


If M is described by F = 0 in a neighborhood of a, and G = pF, where pisa 
positive real-valued C’ function on a neighborhood of a, then M is also described 


by G=0. If Fis regular at a, then so is G. 


Tf is a local parametric representation of the smooth manifold M, then the 
tangent space to M at the point a = y(to) ts the range of dy(to). 


Let M be m-dimensional and let it be given by the equation F = 0 ina 
neighborhood of a, where F is regular at a. For every ¢ close to t, ¢(d) 
lies on M, so F(y(é)) = 0; that is, Fog = 0. From the chain rule it 
follows that dF(a) dp(to) = 0, which shows that the range of dy(to) is con- 
tained in the tangent space. Since ¢ is regular, the range of dy(ty) has 
dimension m, the same dimension as the tangent space. Therefore, the 
range of dp(to) is equal to the tangent space. 


This theorem shows that Definition 3.1 does make sense. The tangent 
space is determined by the manifold M, not by the equation F = 0 that is used 
to define it (as long as F is regular). 


Prove this last statement. 


Tf M and N are smooth manifolds with M C N ina neighborhood of a € M, 
then the tangent space to M at a is contained in the tangent space to N at a. 


262 


11 / surfaces 


Proof 


THEOREM 
3.4 


Proof 


To say that M C N ina neighborhood of a means, of course, that there 
is a ball B(a; r) such that M(\ B(a;r) CN. Let ¢ bea local parametric 
representation of M at a with g(t) = a, and let N be defined by G = 0 
in a neighborhood of a with G regular. Then Gog= 0, so that 
dG(a) dg(t)) = 0. Theorem 3.2 finishes the job. 


Now we shall turn to more geometrical characterizations of the tangent 
space. In Chapter 8 we defined the tangent vector to a path g in R" ata 
regular point to to be the vector ¢’(to). This is related to the differential dy (to) 
by the formula 


de(to)h = he’ (to), 


so the range of the differential is simply the line determined by the tangent 
vector. If Bis a small-enough ball with center fo, then we know (Theorem 2.7) 
that ¢(B) is a smooth one-dimensional manifold, and Theorem 3.2 shows that 
the tangent line to ¢(B) at ¢(to) is simply the line determined by the tangent 
vector ¢'(to). If it happens that the path ¢ lies on a smooth manifold M, then 
Theorem 3.3 shows that the tangent vector ¢’(¢o) lies in the tangent space to 
the manifold. This proves half of the following nice characterization of the 
tangent space. 


Let a be a point of a smooth manifold M. The tangent space to M at a 
consists of all tangent vectors at a to paths on M that pass through a. 


Let M be m-dimensional in R* and choose the coordinates so that in a 
neighborhood of a it is the graph of a function f, that is, is given by equations 


Q = F(x) = xi — fi), tony (1) 


To prove the remaining half of the theorem we have to start with an 
arbitrary vector / in the tangent space T,(M) and produce a path gy on M 
that is defined (for example) on a neighborhood of 0 € R! and satisfies 
g(0) = aand ¢’(0) =A. The idea is to project h down on the space of 
the first m coordinates, to take the line in that space with direction h, and 
then to pull the line back up to the surface. This is accomplished by 
defining y(t) = (4+ th, f(a + th)); that is, 


a(t) =at+th and g(t) = flat th) for 7 > m. (2) 

Straight from the definition it follows that g(¢) does lie on M if ¢ is small 
and that (0) = a. Differentiation of (2) gives 

(0) =F and (0) =) 


j=l 


Ofila 
ie h; fori > m. (3) 
xj 


tangent spaces 203 


On the other hand, the fact that h is in the tangent space gives 


mm 


(Bie » a) 


J 


h; for: > m. (4) 


j=1 


Comparison of (3) and (4) shows that ¢’(0) = A. 
The tangent plane can also be described as the set of limits of chords. 


THEOREM Let a be a point of the smooth manifold M. A unit vector 6 lies in the 
3.5 tangent space to M ata if and only if there is a sequence {x,} in M such that 


pe and aioe. 3 6. (5) 
xe — a 
Proof If @ is in the tangent space, choose a path ¢ on M with ¢(0) = a and 


y’(0) = 0, take any sequence t, > 0, t > 0, and set x, = e (tz). 


Exercise 4 In Chapter 8 we already checked that such a sequence has the property (5), 
but check it again. 


On the other hand, if {x,} is a sequence in M with the property (5), and 
M is defined in a neighborhood of a by F = 0, then we have 


Or EE) F(a) = dF(a)(% — a) + e(x, — a) |x, — al, 


tz — a 


dF (a) = —e(x, — a)—> 0. 


From this it follows that dF(a)@ = 0, so 6 is in the tangent space. 


Exercise 5 What are the equations for the tangent space to a manifold in graph form, 
x: = fix), 1 > m? 


DEFINITION The normal to a smooth manifold M C R” at a point aC M is the 
3.6 orthogonal complement of the tangent space. 
THEOREM If Mis given by F = 0 in a neighborhood of a, F:R" > R™-™ regular ata, 
3.7 then the normal to M ata is the space spanned by VF myi(a), . . . , VF,,(a). 
Proof For any h € R®, we have dF;(a)h = (VF;(a), h), so that dF(a)h = 0 if and 


only if h is orthogonal to each VF;(a). 


264 


11/ surfaces 


THEOREM 
3.8 


Exercise 6 


Exercise 7 


Remark 


Exercise 8 


DEFINITION 
3.9 


If M is a smooth manifold with local parametric representation p:R™ — R’, 
then the normal to M at y(to) is the set of vectors h © R” satisfying 


n 


dgilt 
» a i = 0 for 7 =A, 3 


i= 


Prove the theorem. (Hint: The orthogonal complement of the range is the null 
space of the adjoint.) 


What is the normal to the sumace +, — /(x1, .. « , nai)? 


If MC R* C R'is a smooth manifold, then the tangent space to M at a point 
is the same whether we look at M as a subset of R* or of R’.. The normal, on 
the contrary, is quite different. On the one hand, we take the orthogonal 
complement relative to R” and on the other relative toR’. The normal relative 
to R” is the intersection of R* with the normal relative to R’. 


Prove the assertions in the remark. 


It is interesting (and the results will be useful later) to analyze the one- 
dimensional manifolds, particularly the connected ones. In this case the arc 
length provides local parametric representations that can be pieced together to 
give a parametric representation of the whole manifold. 


Let M be a smooth one-dimensional manifold. A local parametric representa- 
tion by arc length on an open interval [ 1s a C} function p:I — M such. that 
lo’ (e)| = 1 for each t EI. 


Since the arc length of the path ¢ on an interval [o, 7] is just 


[ le’ (2)| de, 


it follows that the condition |y’(#)| = 1 means simply that the arc length on any 
interval [¢,7] is tT — ¢. In particular, we have 


le(t) — v(s)| < |t — | for any s,¢ EI (6) 


(which can be proved in many other ways too). From the results of Section 6 
of Chapter 8 it follows that if a is a given point of the smooth one-dimensional 
manifold M and ty is a given point of R!, then M has a local parametric repre- 
sentation by arc length that is defined on a neighborhood of f) and satisfies 
¢(to.) = a. Our problem is to piece together these local parametric representa- 
tions by arc length. 


LEMMA 
3.10 


Proof 


LEMMA 
3.11 


Proof 


THEOREM 
3.12 


Proof 


tangent spaces 205 


Let ¢, ¥:I—M_ be local parametric representations by arc length. If 


y(to) = (to) and y' (to) = ' (to) for some point to in I, then ¢(t) = W(t) 
Geral = 7. 


It is enough to show that ¢(t) = ¥(¢) on a neighborhood of fo, for this will 
show that the set of points ¢ with g(#) = ¥(¢) and ¢’(#) = W(t) is both open 
and closed in J, and hence equal to all of J. Let P be the usual local 
inverse of g and set f = Poy, sow = g of, and hence 


W) = FOO. 


Since both ¢’ and ’ have absolute value 1, it follows that f’ has absolute 
value 1, and therefore that f’ is either identically +1 or identically —1. 
Now, f (to) = to and ¢’ (to) = '(to), so it must be that f” is identically +1, 
and therefore that f(¢) = ¢ for all ¢ near to. 


Let gi:1; > M and ¢2:I2— M be local parametric representations by arc 
length. If gi(h) (\ ¢2(Z2) # QB, then v1 has an extension g:I—> M, 
whach is a local parametric representation by arc length satisfying e(1) = 
g1(i) U ge(Z2). 


Let a = ¢i(t1) = go(te), and set 
Yo(t) = golt = fh =e to) on qT, + Ki = 


It is plain that 2 is again a local parametric representation by arc length 
and that Wo(t1) = g2(t2) = gi(). Since the tangent space to M at a is 
one-dimensional, there are just two possibilities for ¥4(t:): It is either 
¢,(t1) or —¢{(t1). In the first case put Y = ye, and in the second case 
put y(t) = 2(2t — #), whichjust reverses the direction. By Lemma 3.10, 
¥ = ¢1 on the interval where both are defined; so we can put ¢ = ¢; on 
TZ, and g = y on the interval where wy is defined (which is ¢ — t + J, in 
the first case and t; + t2 — Izinthe second). This function g does the job. 


Let M be a connected smooth one-dimensional manifold. There is a local 
parametric representation by arc length gp: I—> M with o(1) = M. 


Choose a point a € M and a unit tangent vector @ at a. Let J be the 
union of all intervals J containing 0 such that there is a local parametric 
representation by arc length g7:J —> M with ¢7(0) = a and ¢/(0) = @. 
If J is small enough, then g, certainly exists (why?); and if J and K are 
any two such intervals, then by Lemma 3.10, g7 = gx on JM K. Con- 
sequently, the yg; determine a ¢ that is defined on the whole interval J. 
By the construction there is no extension of ¢ to a local parametric 
representation by arc length on a larger interval. 


266 


11/ surfaces 


THEOREM 
Soll) 


THEOREM 
3.14 


Proof 


From the definition of a local parametric representation it follows 
that ¢(I) is a neighborhood in M of each point ¢(fo). In other words, 
g(1) is open in M. If we can show that it is also closed in M, then by 
connectedness we shall have ¢(I) = M. If ¢(Z) is not closed in M, let 
b € M bea point in the closure, but not in ¢(J) itself. Let y:J— M be 
a local parametric representation by arc length in a neighborhood of b. 
Since (J) is open in M, we must have g(1) \ WJ) # @, so we can use 
Lemma 3.11 to extend yg. Since this is impossible, our assumption that 
¢(Z) is not closed in M does not hold up. 


Now let us check what happens in this construction when ¢ is not one 
to one. If g(to) = v(t), then (one-dimensional tangent space, as usual) 
either ¢/(to) = y'(ti) or ¢' (to) = —o’(t1). First let us eliminate the second 
possibility. In this case Lemma 3.10 shows that we must have o(to + 4 — t) 
= y(t) on I(\ t) + t; — J, for the two functions and their derivatives are the 
same at to. Differentiation gives —¢y’ (to + t: — #) = ¢’(#), which is obviously 
impossible at the point ¢ =é + &:/2. 

Consequently, we must have ¢’ (to) = ’(t1), and in this case Lemma 3.10 
gives 


o(t — tp + th) on ITM\t —ht JZ 


OO | od—nt+h) olANh— tl 


If J is not the whole line, then these two formulas permit the extension of ¢ 
either to the right or the left. For instance, if to < fi, then the first formula 
extends ¢ to the left to the interval to — 4: + J, and the second one extends 
g to the right. Since ¢ cannot be extended, it must be true that J = R! and 
that v(t) = o(f — to-+ ty) for all ¢. In this case, M must of course be com- 
pact, for the range of ¢ is the same as the range of its restriction to the interval 
[0, t: — to]. With this additional information we can refine Theorem 3.12 as 
follows. 


Let M be a connected noncompact smooth one-dimensional manifold. There 
1s a local parametric representation by arc length gp: I > M with eZ) =M 
and ~ one to one. 


Let M be a connected compact smooth one-dimensional manifold. There is 
a local parametric representation by arc length p:R' > M with ¢(R!) = M. 
Moreover, there is a positive number a such that is one to one on (0, a) and 


o(t + a) = ¢(t) for each t. 


Theorem 3.13 is already proved, and so is the first part of Theorem 3.14. 
The number a is defined to be the lower bound of the numbers ¢t > 0 
with ¢(t) = ¢(0). Note that when M is compact, ¢ cannot be one to 


Exercise 9 


Exercise 10 


Exercise 1 


functions on manifolds 267 


one, for then the inverse would have to be continuous, while the range 
of the inverse is R!, which is not compact. If g(t) = o(t:), then the 
discussion above shows that g(t — to +t) = g(t), so in particular 
y(ti — to) = (0). This shows that the set of numbers ¢ > 0 with 
y(t) = 9(0) is not empty, so the definition of a makes sense. It also 
shows that if t; > to, then t; — t) > a, so ¢ is one to one on the interval 
(0, a). Finally, by continuity ¢(@) = (0), so we can take to = 0 and 
ti = a to obtain y(t + a) = g(t) for each t. 


Show that a # 0. What is the geometric meaning of a? 


Invent converses of Theorems 3.13 and 3.14; that is, if g¢:J—> R” is regular at 
each point and is one to one and something else, then ¢(/) is a connected non- 
compact smooth one-dimensional manifold, and so on. 


FUNCTIONS ON MANIFOLDS 


Consider the problem of finding the maximum of the function pepe a) = 
3x — 2y +z on the ball x? + y?+ 22 <1. We know that if a function 
f:R"— R' has a local maximum or minimum at a point a and if f is differ- 
entiable at a, then df(a) = 0, or, equivalently, Vf(a2) = 0. In this case Via) = 
(3, —2,1) #0. The conclusion is not that there is no maximum, for the 
closed ball is compact, and there certainly is one. The proper conclusion is 
that f is not differentiable at the point where the maximum occurs. The 
function f(x, y, z) = 3x — 2y + z certainly looks differentiable, but the point 
is that we are looking at the restriction of this function to the closed ball, and 
the restriction is not differentiable at any boundary point, that is, at any 
point of the sphere S? = {(x, y, z):x? + y?-++ 22 = 1}. The maximum does 
lie on the sphere, and at the moment we are helpless to locate it; so what we 
are going to do is discuss the idea of the differential of a function f:M— Rg 
which is defined on a smooth m-dimensional manifold M C Rr’. (In this case 
M = S?.) 


By accident, we are not really helpless in this particular problem. Use Cauchy-— 
Schwarz to show that the maximum is at (3, —2, 1)/W/14. 


Throughout the section it is assumed that M is a smooth m-dimensional 
manifold in R” and that {:M — R? is defined on a neighborhood in M of some 
point a € M (not on a neighborhood of a in R*, but on a neighborhood of a 
in M, which is the intersection of M with a neighborhood of a in R"). If ¢ 
is a local parametric representation of M in a neighborhood of a with g(to) = a, 


268 11/ surfaces 


we shall call f, = fo¢ a local representation of f at (fo, a). Note that f, is 
defined on a neighborhood of to. 


LEMMA Let f, and fy be local representations of f at (to, a) and at (so, a). If fyts 
= differentiable at to, ts of class C! at to, or is of class C' in a neighborhood of 
to, then the same 1s true of fy at so. 


Proof Let P be a local inverse of y, and set g = Poy, so that y = go; hence 


I ere (1) 


Since g is of class C! on a neighborhood of fo, the theorem follows. 
The lemma makes it reasonable to make the following definition. 


DEFINITION The function f:M — R¢ is differentiable at a, or of class C' at a, or of class 
4.2 C! on a neighborhood of a, if some (and hence any) local representation f, at 
(to, a) has the same property at to. 


There is another equally reasonable but quite different way to make the 
definition, which the following theorem shows is exactly the same. 


THEOREM The function f:M — Ris differentiable at a, or of class C" at a, or of class 
4.3 C! on a neighborhood of a, if and only if there is an extension f of f toa full 
neighborhood of a in R® with the same property. 


Proof If there is an extension f with one of the properties listed, and ¢ is a local 
parametric representation, then f, = fog = f°; so it is apparent that 
fe has the same property. On the other hand, if f, has one of the prop- 
erties, and P is the local inverse of g, then f = fo P = fo go P is the 
required extension. 


DEFINITION If f:M— R¢ is differentiable at a, then df(a) is the restriction of df(a) 
4.4 to the tangent space Ta(M), where f is any differentiable extension of f. 


A given function f has various extensions }, each with its own differential. 
It must be shown that all these have the same restriction to the tangent space 
T,(M). If gis any local parametric representation, then fy = fog; therefore, 


dfg(to) = df(a) dy (to). (2) 
Consequently, 
df(a) = df(a) = dfe(to) de(to)* on Ta(M). (3) 


Exercise 2 


THEOREM 
Boe) 


Proof 


THEOREM 
4.6 


Proof 


functions on manifolds 269 


This makes sense because dy(to) is one to one and maps R” onto the tangent 
space 7,(M). It shows that any two extensions do have differentials with the 
same restriction to 7,(M) because the same ¢ can be used with all extensions. 


Start with formula (3) [without the df(a)] as the definition of df(a), and show 
that any two local parametric representations give the same result without 
using the extension /. 


If f:M— R' is differentiable at a, then there is a unique vector Vi(a) in 
T.(M) such that 


df(a)h = (Vf(a), h) for all h C T.(M). 


Vf(a), which is called the gradient of f at a, is the projection of Vj(a) on 
Ta(M), where f is any differentiable extension of f. 


If P is the projection on T,(M) and h € T,(M), then 
df(a)h = df(a)h = (Vf(a), h) = (Vf (a), Ph) = (PVf(a), h). 


The uniqueness comes as usual from the fact that two distinct vectors in 
T,(M) cannot have the same inner product with every vector in T, (M). 


If f:M — Rt has a local maximum or minimum at a, and f is differentiable 
at a, then df(a) = 0. 


If f has a local maximum or minimum at a, then f, has the same at to, 
so by the original theorem of this kind, df,(¢.) = 0. Formula (3) shows 
that dja) — 0, 


In order to use the theorem for actual calculations, we shall have to know 
the equations of M, and we shall want to deal with some specific extension of - 
So as not to multiply the notations, we shall call the extension f. Then we 
shall be interested in the local maxima and minima of the restriction of f to 
M, which is written fly. If f|y has a local maximum or minimum at a, and 
f is differentiable at a, then, according to Theorems 4.5 and 4.6, Vf(a) is normal 
to M at a (for its projection on the tangent space is 0). Now if M is given by 
the equation / = 0 in a neighborhood of a, where F:R*™ > R*™-™ js regular at 


a, then the normal is the space spanned by VFimii(a), . . . , VF’,(a); so there 
exist numbers Amsi, . . . , An such that 
Via) =) WFi(a). 4) 
j=m+1 
The numbers \nj1, . . - , An are called Lagrange multipliers. 


270 


11/ surfaces 


THEOREM 
4.7 


Proof 


Example 


(Lagrange Multipliers) Let M be a smooth manifold given by the equa- 
tion F = Oina neighborhood of the pointa, F regular ata. Letf:R"— R! 
be differentiable at a, and set 


ge) =f@)—- ) OF 
j=um+1 
Tf flue has a local maximum or minimum at a, then Vg(a, ) = 0 for some 


ee Waa 


We are considering g as a function of all the 2n — m variables x and 4, 
and Vg refers to the gradient in all these variables. Since dg/0\; = F;, 
the equations dg/d\; = 0 say simply that a lies on M. The equations 
ag/dx; = O are just another way of writing (4). 


Find the maximum and minimum of the function f(x, y, z) = 3x — 2y + zon 
the ballic2=-y- awe <1 


At the beginning of the section we saw that both the maximum and 
minimum exist and that both must occur on the boundary M = 8? = 
{ (x, y, z)ix2 + y? + z? — 1 = 0}, so we consider Lagrange multipliers and the 
function 

g(x,y, 2, A) = 3x — 2y +z — A(x? + y? + 2? — 1). 
We have 
0g 


Og 0g 0g 
= =3—2dhx, — = —2—-2dy, -~ = 1-22 - = -— ‘ oe ihe 
a sary Jy ae 2) 2 sy) ee ) 


When we put Vg = 0, the first three equations give 


(5) 


i] a? 2 = 


3 1 
2 r Dh 
and then substitution in the last equation gives 


+V14 
Z 


Since the closed ball is compact, there must be both a maximum and a mini- 
mum. One must be given by (5) with X = V 14/2 and the other by (5) with 


\} = —V 14/2. Calculation of the value of f at the two points shows that the 
former is the maximum and the latter the minimum. 


Exercise 3 


Exercise 4 


Exercise 5 


Exercise 6 


functions on manifolds 27k 
Find the distance from the point (—1, 1) to the curve xy = 1 


Let 6 € R” and let a be a point of a smooth manifold M that is closest to 5. 
Show that 6 — a is normal to M at a. Does such a closest point always exist? 


Show that a self-adjoint linear H:R" — R” must have an eigenvalue (Theorem 
9.3 of Chapter 9) by using Lagrange multipliers. 


Let M be the torus obtained by revolving the circle x? + (y — 2)? = 1 around 
the x axis. Use Lagrange multipliers to find the maximum and minimum of 
the function f(x, y, z) = z on M. 


Let us consider this last exercise from the geometric point of view. Accord- 
ing to the discussion preceding Theorem 4.7, we should look for the points 
where the gradient of f is normal to M. The gradient of f is 


v7 (07031) 


so we should look for points where the tangent plane to M is horizontal. The 
torus M looks as shown in Figure 2. The tangent plane is horizontal at the 
four points (0, 0, +3) and (0, 0, +1), and these are the four points the method 
of Lagrange multipliers will produce for you when you do the exercise. It is 
plain that (0, 0, 3) is the maximum point and (0, 0, —3) is the minimum point, 
and that the other two are neither maximum nor minimum points. In other 


y 


Figure 2 


272 


11/surfaces 


Exercise 7 


. 


DEFINITION 
5.1 


Figure 3 


words, the method of Lagrange multipliers turns up the possibilities for the 
maximum and minimum points, but these possibilities have to be checked. 
The points (0, 0,1) and (0,0, —1) are called saddle points because the surface 
looks something like a saddle (or upside-down saddle) in a neighborhood of 
these points (Figure 3). 


Let f:M— R’ be differentiable at the point a@ M. Let N be a smooth 
manifold in R?. If {(M) CN, then df(a2): T,(M) — 7;(N), 6 = f(a). 


QUADRATIC FORMS AND QUADRIC SURFACES 


The quadratic forms are the simplest functions after the linear ones, and the 
quadric surfaces are the simplest surfaces after the planes. They can be 
analyzed in detail with the aid of the results of Section 9 of Chapter 9. 


A quadratic form on R” 1s a function Q:R” — R? of the form 
Q(x) = (Px, x), (1) 


where T is a linear transformation from R” to R”. 


If {a:;} is the matrix of T relative to any orthonormal basis of R”, then 
(1) becomes 
Qc) =D. aiprey (2) 


, jel 
Equation (1) does not determine the linear transformation T uniquely. 
Indeed, 
(Tx, x) = (x, T*x) = (T*x, 2), 


THEOREM 
5.2 


Proof 


Exercise 1 


THEOREM 
5.3 


Exercise 2 


DEFINITION 
5.4 


quadratic forms and quadric surfaces 273 


so TJ and 7* determine the same quadratic form Q, and so does the self-adjoint 
transformation 


T+ T* 
a ee 
The equation 
OG Cia) (3) 


determines a one-to-one correspondence between the quadratic forms and the 
self-adjoint linear transformations. 


What has to be shown is that equation (3) determines H uniquely. If 
(Hx, y) can be expressed in terms of Q, then this will do the trick, for two 
distinct vectors cannot have the same inner product with every vector y. 
(Hx, y) is expressed in terms of Q by the identity 


4(Fx, y) = Q(x +y) — Q(x — y). (4) 


Prove this identity when #7 is self-adjoint and satisfies. (3) by simply writing 
out the right-hand side. 


Let Q be a quadratic form with corresponding self-adjoint transformation H. 
Let di, . . . , An be the eigenvalues of H and let e1, . . . , én be an ortho-. 
normal basis of eigenvectors. In coordinates relative to ¢1, . . . , én we have 


Q(x) = ) Nix}. 


Prove the theorem. 
A quadric surface in R” 1s the set of points that satisfy a quadratic equation 
(Hx, x) + (x, b) + ¢ = 0, (5) 
where H:R” — R® is self-adjoint, b C R", and ¢ € Rl. 


If {a} is the matrix of H relative to any orthonormal basis, and 5 = 
(b1, . . . , by) relative to this basis, then (5) becomes 


n 


AYjX{Xj + » bix; + C= 0. (6) 


4j=1 t=] 


274 


11/ surfaces 


Exercise 3 
Exercise 4 


Exercise 5 


THEOREM 
Da 


In terms of coordinates relative to an orthonormal basis of eigenvectors of H, 
this equation simplifies to 


y wet > iyi te = 0. (7) 


i=l i=l 
What are the A; and yp; in terms of the original data in (5)? 
If Q is a quadratic form with corresponding self-adjoint H, then VQ(@) = 2Ha. 
Show that the set of singular points (= nonregular points) of the quadric 
surface defined by (5) is either empty or is a plane contained in the plane 


parallel to the null space of H. 


Formula (7) can be simplified still further by combining the quadratic 
and linear terms by completing the square. If d; # 0, then 


RNS Me 
hy: + wy = (> a ) = er 


2X: 4); 
Thus, if \1, . . . , Az are #0 and the rest are 0, then equation (7) becomes 
k n 
Ya: qm GH) Se » niyi t+ d = 0. (8) 
i=1 i=k+1 


A final coordinate change, 


2p— yer a for? = k, zi=y: fori >k, (9) 
gives 
k n 
Y v2! te » Note ae: (10) 
i=1 jek+1 


Note that the coordinate change (9) is not of the kind that we have been 
considering. The new axes are parallel to the old, but the origin is at the point 
0 = 2; = y; + a; that is, ys = —ai, fori <k. Such a coordinate change is 
called a translation. Note that it is not linear. 


By means of a suitable choice of an orthonormal basis in R", and a translation, 
the equation (5) can be put in the form (10), where the di are the nonzero 
eigenvalues of H. 


Study of formula (10) shows that the nature of the quadric surface depends 
on the number of positive \;, the number of negative );, the number of nonzero 


Exercise 6 


Example 


quadratic forms and quadric surfaces 275 


A, >0 A, >0 A, >0 
2 > 0 A, <0 A, =0 
ellipse hyperbola parabola 


Figure 4 


i, and the sign of the number d. For example, if there are n positive dz, then 
there can be no y;; and the surface is empty if d > 0, consists of a single point 
if d = 0, and is an ellipsoid if d < 0. 

In two dimensions the complete analysis is immediate (Figure 4), When 
d = 0 the ellipse becomes a single point, and the hyperbola becomes a ‘pair 
of lines. When ye = 0, the parabola becomes either empty, or a single line, 
or a pair of parallel lines, according as d > 0, d = 0, ord < 0. 


Discuss the various quadric surfaces in R°. 


The actual reduction of a given equation (5) to the form (10) is easy 
enough in two dimensions, but beyond that it is pretty cumbersome. 


Discuss x? — 2xy — y24+x—6=0. 


The matrices of H and of H — dJ are 


ee F 1-rv -1 \ 
= es eo =i 


The eigenvalues are the numbers \ for which the latter is not one to one, that 
is, for which the determinant 


GOR =n (=) (1 = nee > 


is zero. These are \y = V2 andy = —V2. From this we can tell already 
that the curve is a hyperbola because one of the eigenvalues is positive and the 
other negative; and we can tell that the quadratic part of the new equation 


will be V2 22 — V2 w?. But to get the rest of the equation and to see where 
the new axes go, we have to push on with the calculation. 


To get the eigenvector ¢; corresponding to \; = V2 we have to solve the 


equations 
h—-k=V2h and ~-h—-k=V2b, 


276 


11 / surfaces 


Exercise 7 


but it is sufficient to solve just one of them, and then the other will be auto- 
matically satisfied. 


Why? 


Taking h = 1, we get k = 1 — v2, but to get ¢, we should normalize 
this by dividing by its length. Thus, 


1 7 
Sgeyean as Oe ee eg = wos 14/24). 


Note that e2 comes free because it is perpendicular to ¢; and has length one. 
If a given vector v has coordinates (x, y) relative to the initial basis fi, fs 
of R?, and coordinates (z, w) relative to the new basis, then 


i (v, fi) ma 2(é1, f1) ae wer, fr), 
y = (, fr) = z4e1, fo) + weer, fr). 


Hence, 


1 
ss (a Ei) 
Vane (11) 


Ve oS (1 — V 2)z + w). 


Now we go back to the original equation and substitute these values of x and y 
to get 
1+ v2 


1 
V/222-— V2w ev er a 1 =o G2 


(This part of the calculation looks fearsome, but remember that we already 
know the quadratic part and can ignore it completely.) At this point the 
equation is in the form (7), and all that remains is to complete the square to 
arrive at (8) and then to make the substitution (9). 


Exercise 8 Carry out the rest of the calculation. 


a 


Consider the general equation 


ax? + bey +o? +dx+taotf=0 (13) 


THEOREM 
5.6 


quadratic forms and quadric surfaces 


in dimension 2. The matrices of H and H — XJ are 


b b 
a = a—nxr = 

2 4 2 
j an ; 4 
a ° i 


The eigenvalues are the numbers )\; and 2 for which 


2 2 


DQ) = @-N6-N-T=M= + ortae—* 


is 0. Hence, also D(A) = (A — A1)(A — Az); therefore, 
AL ++ de = @ + Cc and Arie = 4ac = 52, 


277 


(14) 


The curve is an ellipse exactly when both eigenvalues have the same sign, that 
is, when 4ac — 6? > 0. It is a hyperbola when they have different signs, that 
is, when 4ac — b? <0. And it is a parabola when one is 0, that is, when 


4ac — b? = 0. 


If a, b, and ¢ are not all three 0 in equation (13), then the equation represents 
an ellipse if b® — 4ac < 0, a hyperbola if b? — 4ac > 0, and a parabola 
if b® — 4ac = 0. The number b? — 4ac is called the discriminant of the 


equation. 


(Of course, allowance must be made for the degenerate cases when the 


ellipse collapses to a point, and so on.) 


278 


123 Higher Derivatives 


1 


THEOREM 
1.1 


Exercise 1 


Proof 


SECOND DERIVATIVES 


The partial derivatives of a function f:R” — R®* are again functions from R” 
to R*, which may have partial derivatives of their own. If so, the latter are 
called the second partial derivatives: 


d°f(@) 


5, das Dif(e) = Di(D,f)@). (1) 


The formula says that to get D,;f(@), you take first Djf, which must exist on a 
neighborhood of a, and differentiate it with respect to x; It appears that this 
would be quite different from D;;f(a), which is formed by differentiating first 
with respect to x; and then with respect to x;. In fact, the two are usually the 
same. 


If Dif and D;f are both differentiable at a, then Djjf(a) = D,:f(a). 


Show that it is enough to treat the case where {:R?— R!. 


Let f:R? > R’, and write (a, 5) instead of (a1, a2), and so on. Consider 
the quantity 
FG) = e+ hbk) — fat, 6) — {Gb a=, pane): 


If p(x) = f(x, b + A) — f(x, 6), then F(h) = g(a + A) — g(a), and appli- 


cation of the mean-value theorem to ¢ gives 


F(h) = g'(E)h = [Df (E, 6 +h) — Dif (E, 4), 


Exercise 2 


Exercise 3 


2 


DEFINITION 
2.1 


second derivatives 279 


where £ is between a and a+. The fact that Dyf is differentiable at 
(a, b) gives 


Dif(& 6 + h) = Dif(a, 6) + Duf(a, 6)(— — a) + Daif(a, b)h + (A) IAI, 
Dif (& b) = Dif(a, b) at Disf(a, b)(é we a) cite €2(h) Al. 
Therefore, 
F(h) = Daf(a, b)h? + e(h)|Al?. (2) 
On the other hand, if ¥(y) = f(a+4,y) — f(a,y), then F(A) = 
¥(b + h) — p(6), and a repetition of the above argument gives 
F(h) = Diof(a, b)h? + €(A)Al?. (3) 
From (2) and (3) it follows that 


Daif(a, b) = = lim? = Dyf (a, b). 


Give an example of an f:R?—> R! for which Dif(a) and Doif(a) both exist but 
are different. 


If Def and Doif exist on a neighborhood of a, and Deif is continuous at a, then 
Dif exists at a and is equal to Daif. 


HIGHER DERIVATIVES 


The third derivatives are the derivatives of the second, the fourth are the 
derivatives of the third, and so on. The general definition is best put induc- 
tively. The notion to be defined is the derivative D;f, where i is now not a 
single integer between 1 and n but a finite sequence 7 = Gi, - 5 «= 34-)) “Phe 
number r is called the order of the derivative and is written r = |i|. 


PRO OR andi = (1, . « » ,1,), then 
Dif = Di(Di)f 


(OReT eee, 4). 


For technical reasons it is convenient to write 0 for the empty sequence 7 
and to set Dof = f. This avoids exceptional cases in many formulas. 

A function is of class C’ at a point if the partial derivatives of orders <r 
all exist on a neighborhood of the point and are continuous at the point itself. 
Again the definition is best put inductively. 


280 


12/higher derivatives 


DEFINITION 
222 


Exercise 1 


THEOREM 
2.3 


Exercise 2 


THEOREM 
2.4 


Exercise 3 


THEOREM 
225 


Exercise 4 


A function f:R™ — R® is of class C* at a point a if f and df are of class 
Cr-1 at a. It is of classC* on an open set if it is of class C* at each point 
of the open set. 


This determines C’ for every r > 1, because C! is already known. Again, 
however, it is convenient to invent something forr = 0. The useful notion is 
that f is of class C° at a if it is defined on a neighborhood of a and is continuous 
at a. 


Show that Definition 2.2 gives back the original notion of C’, if C® is defined 
as above. 


From the inductive nature of the definitions it is clear that most proofs 
should rest on induction too. 


Tf f, g:R"™ — R* are of class C* at a, and ais a real number, thenf + g and 
af are of class C* at a, and 


DEG + g) al Dif - Dig and Di(af) = aD f if |2| < Fo 
Prove the theorem. 


If f:R™— R! and g:R"— R* are of class C’ at a, then the product 
fg:R™ — RX‘ 1s of class C’ at a. 


Prove the theorem. 


Let R'S RS Re If f is of class C’ at a and gis of class C’ at b = f(a), 
then g of is of class C" at a. 


Prove the theorem. 


One consequence of Theorem 2.5 is that the notion of class C* does not 
depend on the coordinate system in R*. Let F be a given function on R’, 
and suppose that a given vector v has coordinates x with respect to one basis 
and coordinates y = ¢(x) with respect to another. In terms of the first coor- 
dinates, the function F determines a function f by f(x) = F(v), and in terms of 
the second it determines a function g by g(y) = F(v). Thus, 


By) =f@) if y= eG) 


or, in other words, f = gog. In this case ¢ is linear, so it is clearly C* for 
every r, and the same is true of its inverse. Therefore, the theorem shows that 
f is C’ if and only if g is. 


Exercise 5 


DEFINITION 
2.6 


THEOREM 
2.7 


Proof 


higher derivatives 281 


Let f:R”™— R”. In the discussion above we made a change of coordinates in 
R”. Now make a change of coordinates in R* and show that the notion of 
class C” remains the same. 


Consider the space £nn of linear transformations from R” to R*. As soon 
as bases are fixed in R™ and R", Ln can be identified with the space of matrices, 
and this in turn with R™". This amounts to choosing a basis in £m» in the 
following way. Ifei, . . . , mis the basisin R™, and fi, . . . , f, is the basis 
in R®, let 7; be the linear transformation such that 


T 50; => ie Tijen = 0 if k ee ve 


This is the linear transformation whose matrix has 1 in the ith row, jth column, 
and 0 everywhere else. It is plain that 


(= » aor 
1,9 


if and only if T has matrix {a}. In other words, the T;; form a basis of Ln, 
and the matrix of a linear transformation is nothing but its set of coordinates 
relative to the basis. 

Once Ly, is identified with R™ it makes perfectly good sense to speak of 
functions of class C* from R* to Lan, from Lmn to R*, from Ln to Lpg, and so on. 
The above discussion shows that how the bases are chosen in the various spaces 
isimmaterial. To be sure the ideas are fixed clearly, let us state the definition 
explicitly in one case. 


Let bases be fixedin R™ and R”. A function f:R* > Ln ts of class C’ ata 
point a if each element of its matrix is of class C’ at a. 


Of course, the elements of the matrix are functions f,;:R*—> R'. If the 
bases are changed, then these functions are changed; but the discussion above 
shows that if they are of class C* for one pair of bases, then they are of class C* 
for any other. 


Let f:R™ — R* be differentiable at each point of an open set G. Then f is 
of class C* on G if and only if df:R™ — Lmy is of class C*—) on G. 


The elements of the matrix are the first derivatives D,f;, so the result 
follows straight from Definitions 2.6 and 2.2. The theorem looks like 
nothing more than a ponderous tautology, which it is, but watch what 
happens in Section 3. 


282 


12/higher derivatives 


Exercise 6 


Remark 


Exercise 7 


3 


THEOREM 
Soil 


THEOREM 
3.2 


Let f, g:R*— San. If f and g are of class C’ at a point a, then h(x) = f(x)g(x) 
is also of class C’ at a. 


Other notations for the derivative D;f are 


orf 
Oxi, Oxi, FP Oar Ox; 


r 


and eee ee 
Dee?! r 

If f is of class C*, then all the differentiations with respect to x, can be lumped 

together, all those with respect to x2 can be lumped together, and so on. In 

this case the notation is often 


oi 
Ox! axl? ~~» Axt where 7 = Kj. 


Every closed set in R” is the set of zeros of a C* function. (Hint: This is the 
extension of Exercise 18, Section 2, Chapter 11, to the C® case, and the proof is 
almost the same. All that is necessary is to take each f; of class C* and the 
a, so small that the series can be differentiated any number of times without 
losing the uniform convergence. This can be accomplished by redefining My 
to be the maximum of f; and all its derivatives through order k.) 


THE INVERSE- AND IMPLICIT-FUNCTION THEOREMS 


Suppose in the inverse-function theorem that the function f is of class C" on a 
neighborhood of a, with r > 1. Then the inverse is of class C’ on a 
neighborhood of b = f(a). 


Suppose in the implicit-function theorem that the function F is of class C* on a 
neighborhood of (a, b), withr > 1. Then the solution ¢ 1s of class C* on a 
neighborhood of a. 


It is enough to treat the inverse-function theorem because the other was 
derived from it. If @:Lnn— Laz, is the inverse map, that is, @(7) = 7‘, then 
the composite-function formula gives 


dp = Gedfog. (1) 


From the inverse-function theorem itself we know that ¢ is of class C®, so we are 
in a position to use induction. (Actually we know that is of class C', since 
we already carried out the first step of the induction in Section 7 of Chapter 10.) 


LEMMA 
3.3 


Proof 


Exercise 1 


Exercise 2 
Exercise 3 


Exercise 4 


the inverse- and implicit-function theorems 283 


Assume for the moment that @ is of class C71. If we take as an inductive 
hypothesis that ¢ is of class C’—!, then formula (1) and Theorem 2.5 show that 
dy is of class C™~!, and then Theorem 2.7 shows that ¢ is of class C", and we are 
done. 


R is of class C* for every r. 


We shall calculate dR! If A and B are invertible linear transformations, 
then 
A — B= A“"(B — A)Bo. 


Therefore, A~' = B-' + A~'(B — A)B-', and if this is put back into the 
first formula, we get 


A! — B" = B“(B — A)B + A“\(B — A)B-1(B — ANB. 
Taking A = X + Hand B = X, we find 
(X 4 WH) — X71 = —X7HX 4+ (X 4+ WX HX, (2) 
From this we shall conclude that 
dQ(X)H = —X-HX-}, (3) 


First of all, the function F(H) = —X-1HX~' is certainly linear in H. 
(X is fixed here.) Therefore, what has to be shown is that 


(X + H)HX“HX> 


—>0 as H— 0. (4) 
||| 


Show that if ||H7|| < $||X—||, then 
|(X + A) AXHX'|| < 2\|¥-4|§|| 2. (5) 
(Hint: If necessary, look back at Theorem 8.8 of Chapter 9.) 


From formula (3) we can read off the lemma by induction. If we take 
as an inductive hypothesis that ® is of class C’-!, then formula (3) and 
Exercise 6 of Section 2 show that d@ is of class C’-!, and therefore that 
® is of class C’. 


Does this induction begin all right? 
d@ is a function from where to where? 


Use formula (6), Section 11 of Chapter 9, to prove Theorem 3.1 directly without 
using @ at all. 


284 


12/higher derivatives 


DEFINITION 
3.4 


Exercise 5 


Exercise 6 


The basic tool in the study of manifolds in the last chapter was simply the 
implicit-function theorem. Now that we have this theorem for class C’, we can 
review the theory of the last chapter from the point of view of manifolds of 
class C”. 


A set M C R®” is a smooth m-dimensional manifold of class C* if for each 
point a EC M there is a function F:.R" — R”™ that ts regular and of class 
C’ on a neighborhood of a and such that M = {x:F(x) = 0} in a neighbor- 
hood of a. 


In terms of this definition, the original smooth manifolds are simply the 
smooth manifolds of class C’. 

All the original theorems remain valid in this setting—a smooth manifold 
of class C’ is locally the graph of a C’ function, it has local parametric repre- 
sentations that are of class C’ and have local inverses of class C’, and soon. The 
original proofs remain valid, too. The only additional fact that is needed is 
the fact that the implicit-function theorem works for C”. 


Go through the definitions and theorems of Chapter 11 and restate them for 
smooth manifolds of class C’. 


Since the theory of smooth manifolds of class C’ appears to involve nothing 
new, the question arises as to why they should be considered at all. There are 
many interesting problems that do depend on higher derivatives. We simply 
have not encountered them. 


Let f:R! — R! be a function that is of class C! but not C?; for example, f(x) = x? 
for x > 0 and f(x) = —x? for x < 0. Show that the graph of f is a manifold 
of class C! but not of class C?. 


TAYLOR’S FORMULA 


Once again Taylor’s formula shows how to approximate general functions by 
polynomials. The idea is to apply the ordinary Taylor’s formula to the function 


g(t) = f(a + th), 


where f is a given function from R”* to R', and a and / are given points in R”. 
Let us do it and see what happens. 


aa gh(r)(1 — 0) 
j= eae > Zea — +2 


k=0 


(1) 


THEOREM 
4.1 


Exercise 1 


Taylor’s formula 285 


wherer is between 0 and 1. Therefore, we have to calculate the derivatives 
ofg. Note that g = fog, with g(t) = a+ th, so 


n n 
s0) = ) Difla+ the@ = Y Diflat wyh 
j=1 j=l 
By the same argument 


n 


s') =) Disfla + th hts 


1j=1 


and, in general, 


EG) = » Dig vtafla = th) hihi, Boe 


where 41, . . . , % all vary from 1 ton. If we write 
= hihi, 8: hi,, (2) 
the formula becomes simply 
HO = DY) Difla + tn 3) 
li] =k 


and formula (1) becomes 


1 ; ‘ 1 ; 
(ee 2 qiDu@t +a, D Difla + tA) K, 


where 0-7 = 14) 


(Taylor’s Formula) If f:R" — R' is of class C™+ at each point of the 
line segment from a to x, then there is a point & on this segment such that 


1 ; 1 
fo) =) PMO — a+ YY DO a.) 
li|<r pee! 
The first sum is the Taylor polynomial 77/ of f of order r at the point a. 
The second sum is the remainder. 


Show that the hypothesis that f is of class C*+! justifies the calculations that 
went into the proof of the theorem. (Actually, a little less will do, for example, 
class C” plus differentiability of the derivatives of order r.) 


286 


12/higher derivatives 


. 


THEOREM 
5.1 


Remark 


Proof 


LOCAL MAXIMA AND MINIMA 


In the case of functions of one variable we were able to use Taylor’s formula to 
obtain a rather complete test for local maxima and minima. The only situa- 
tions left open were those in which the derivatives fail to exist, or else all vanish. 
The same argument gives interesting results for functions of several variables; 
but they are less complete, and the calculations are likely to be fearsome. 

Let f:R*— R' be of class C" on a neighborhood of a. Suppose that all 
derivatives of order >0 and <r vanish at a, but at least one of order r does not 
vanish, and define 


1 : 
Q(t) = a Dif(a)i. (l) 
From Taylor’s formula in the form (5) of Section 4, and the fact that the deriva- 


tives of order r are continuous at a, it results that for every positive ¢€ there is a 
positive 6 such that 


VG se) IO) C2) slay al 
hence 


Qr(h) — lhl" < fla +h) — fl) < QCA) + elhl if [A] < 6. (2) 


This formula suggests that the sign of f(a + A) — f(a) (i.e., whether f has a 
maximum or minimum at a) depends on the sign of Q,. 


f has a local minimum at a if Q,(h) > 0 for every h # 0, a local maximum 
at aif Qr(h) < 0 for every h 0, and neither one if Qy is positive at some 
h and negative at others. 


What the theorem does is to reduce the study of f to the study of the much 
simpler function Q,. But it leaves open the case where Q, is nonnegative for 
all h, but not actually positive, and also the case where Q, is nonpositive for all A, 
but not actually negative. For example, both functions f(x, y) = x? + y* and 
g(x, y) = x? + y* vanish together with the first derivatives at the origin, and 
they have the same Qa, that is, Qo(h, k) = 2h?, which is >0 but not >0 for all 
(h,k). Obviously, f has a minimum at the origin, and g does not. 


The important fact to notice is that Q, is homogeneous of degree r, that is, 


Qr(th) = £Q,(A), (3) 


which is perfectly obvious from the definition in formula (1). 

First suppose that Q,(h) > 0 for all h # 0. Since the unit sphere 
S(0; 1) is compact, Q, has a minimum m on this set, and m must be positive, 
because it is a minimum and not just a lower bound. For every 4 # 0, 


Exercise 1 


local maxima and minima 287 


h/|h| is on the unit sphere, so we have Q,(h/|A|) > m. Then formula (3) 
gives 


Qr(h) = mil’, (4) 


and this holds for all A, since both sides are 0 forh = 0. Taking e < m 
and using formula (2), we get 


flat h) —f@) 2(™m— orl if jh < 4, (5) 
which shows that f has a strict local minimum at a. 


Write out the proof of the fact that f has a strict local maximum at aif Q,(h) < 0 
for all h 4 0. 


Suppose that Q,(h1) > 0 and Q,(h2) < 0. According to formulas (2) and 
(3), we have 
fla + th) — fla) & t7(Q,(a1) — €|fal’) if |thi| < 6. 


First choose € so that Q,(h1) — e/A:|" > 0 and find the corresponding 6. If 
t > 0 and small enough so that |thi| < 4, then f(a + thi) — f(a) > 0; this 
means that f cannot have a local maximum at a. 


Exercise 2 Use /2 in the same way to show that f cannot have a local minimum at a. 


Exercise 3 


THEOREM 
5.2 


Proof 


If r is odd, then there are always points where Q, is positive and others where it 
is negative, so there is never a local maximum or minimum. 


Although Q, is generally simpler than f, it is still a difficult job to decide 
whether such a function is always positive. Thecaser = 2is interesting because 
Q2is a quadratic form. <A quadratic form Q is positive for all h ¥ 0 if and only 
if the corresponding self-adjoint linear transformation H is strictly positive 
definite, and this is true if and only if all the eigenvalues are positive. Calcula- 
tion of the eigenvalues is still hard, though, so it is worthwhile to have a theorem 
that is easier to manage, at least in low dimensions. 


Let H:R" — R® be self-adjoint with matrix A relative to an orthonormal 
basis. Let Aj, be the k by k matrix in the upper left-hand corner of A. Then 
FZ 1s strictly positive definite if and only if det A, > Ofork =1,...,n. 


Let V, be the space spanned by the first k basis vectors, let P;, be the 
projection on V;, and let K; be the restriction of P,HP; to V;. Then A; is 
nothing but the matrix of K;,, and the determinant of A; is the product 
of the eigenvalues of K;,. 


288 


12/higher derivatives 


Exercise 4 


Exercise 5 


Exercise 6 


Suppose first that H is strictly positive definite. For x € V,, x # 0, 
we have 


Ope x) = (RAP. x) = CIP a ieee) = (Hx, x) > 0. 


This shows that A; is strictly positive definite, so all its eigenvalues are 
positive; therefore, their product is positive. 
To go the other way we shall use induction and Exercise 13, Section 9 


ot Chapter 9, bet Xj = -- = = A, be the eigenvalues of HW, and let 
M1 > * * * & pn be the eigenvalues of P,-1.HP,-1._ By the exercise cited 
we have 
Ne 2 Bed fork — sleet aie 

Now #1, . . . , n-1 are just the eigenvalues of Kn»_1, and by the induction 
hypothesis these are all positive, while clearly uw, = 0. Thus, we have 
\. = Ofork =1,...,n-—1. But we also have 

0 < det A, = detd =i °°: An, 
and the two together give \, > Ofork =1,..., 7. 

Prove that yi, . . . , #n_1 are the eigenvalues of K,-1 and that up, = 0. 


Decide whether the quadratic form 
OG. 2) — 36° — 2xy = Dye ye 


is positive definite. 


Another way to look at Theorem 5.1 is that in a neighborhood of a the 
graph of f looks enough like the graph of Q,, so that if Q, has a strict local 
minimum at 0, then f has a strict local minimum at a; if Q, has a strict local 
maximum at 0, then f has a strict local maximum at a; if Q, has neither a local 
minimum nor a local maximum at 0, then f has neither a local minimum nor a 
local maximum ata. The gap between Q, and f shows up in the word “strict,” 
which appears in the first two statements but not in the third. 


In the case of two variables, draw the graph of a Q2 with two positive eigenvalues, 
of a Q2 with two negative cigenvalues, and of a Qe with one of each. Observe 
the minimum, maximum, and saddle point that result. 


part III 


291 


13 : Integration 


1 


INTRODUCTION 


In Chapter 4 the Riemann integral of a function f on an interval J was defined 
as follows: Partition the interval J into small intervals J,. In each J, choose 
a point &. Define the integral to be the limit of the sums 


FEE), 


1 


1h 


where /(J,) is the length of J,, and the limit is taken as the maximum length 
goes to 0. 

The same idea carries over to higher dimensions. If, for example, f is 
defined on a rectangle in the plane, partition the rectangle into small rectangles 
Ry. In each R, choose a point &. Define the integral to be the limit of the 
sums 


>, fé)a(Re), (1) 
k=1 


where a(R,) is the area of Ry, and the limit is taken as the maximum diameter 
of R; goes to 0. 

There is no real additional difficulty as long as the function fis continuous— 
but the resulting theory is quite unsatisfactory. Integration over rectangles is 
not nearly enough. To find the volume of a ball, for example, involves integra- 
tion over a circle—which is much more complicated because a circle cannot be 
partitioned into rectangles. 

One possibility is to choose some large rectangle R, which contains the 
circle C, and to partition R into rectangles R,. In this case there are two 
different sums that are equally reasonable in formula (1). The first involves 


292 


13/integration 


the rectangles contained in C, the second those that meet C. The problem that 
haunts the whole theory is to show that these two equally reasonable sums lead 
to the same result. 

A second possibility is to extend the function to be integrated, which is 
defined initially on the circle, by putting it equal to 0 outside. ‘The difficulty is 
ultimately the same, for the extended function is not continuous but, in general, 
discontinuous at each boundary point of the circle. In both cases the nature of 
the boundary of the circle must be analyzed very carefully. 

Another way to look at the Riemann integral is this. The sum (1) can be 
regarded as the integral of a function that is constant [with the value /(&)] on 
each of the rectangles R;. The problem, then, is to approximate the given 
function f by functions that are constant on rectangles. If f is continuous and 
is defined on a rectangle, then the approximation can be done quite well and 
quite easily; but if either condition is violated, it cannot. 

A different idea, which looks similar at first but is radically better, is to 
approximate the given function f by functions that are constant on more general 
sets than intervals or rectangles. ‘This can be accomplished as follows: 

Suppose that f is nonnegative and bounded, and lete > Obegiven. Define 


Ej = {xje Sf) < G+ Ue}, (2) 


and let f. be the function that takes the value je on the set Z;. The set X on 
which f is defined is thus partitioned into a finite number of sets E; on each of 
which f, is constant; moreover, 


He —e < f(%) = FG) for every x. (3) 


This shows that f.— f uniformly ase—> 0. If we let f, be the function f. with 
¢ = 2-*, we not only have that f,— f uniformly but also that the sequence 
{f.} is increasing. 

With such good approximations available, the problem of defining the 
integral of f is reduced to that of defining the integral of fe, which in turn is 
obviously equivalent to the problem of defining the “length” or ‘area Of thie 
sets Ej. Let us look at the two-dimensional case where the language is similar 
to the language we shall use in n dimensions. 

Before the pioneering work of the French mathematician Henri Lebesgue 
(about 1900), the area of a set E in the plane was defined to be the number 


a(E) = inf Za(R:), (4) 


where {R;,} is a finite sequence of rectangles covering E and the inf is taken over 
all such finite sequences. This is a very reasonable and intuitive definition— 
but it leads right back to the Riemann integral with all the problems that we 
have just mentioned. 


Exercise 1 


introduction 293 


Lebesgue made the apparently innocuous modification of allowing infinite 
sequences of rectangles as well as finite sequences, and this transformed the 
whole subject. Let us look at the effect of this modification in one simple 
example. 

One of the basic properties of area is that if S and T are disjoint sets, then 
it should be true that the area of S\U T should be the sum of the area of S and 
the area of T. 


Show that a(S U T) < a(S) + «(T) and thatu(S U T) < u(S) + u(T), where 


u is Lebesgue’s modification of a. 


Let & be a rectangle and let S be a sequence that is dense in R, for instance, 
the set of all points in R with rational coordinates. Let T= R—S. Any 
finite sequence of rectangles that covers either S or T must cover all of R. 
Consequently, a(S) = a(T) = a(R), and we do not at all have the desired 
additivity formula. 

On the other hand, if S = {x,}, and e > 0 is given, then let R, be the 
rectangle with center x, and area e/2*, Then the sequence {Rz} covers S, and 


u(S)< ) — =e. 


Thus, u(S) = 0. Exercise 1 gives u(T) < u(R) < u(T) + 0, sou(T) = u(R), 
and we do have the required additivity, u(R) = u(S) + u(T). 

Even with the Lebesgue definition of area we shall not have the additivity 
u(SU T) = p(S) + u(T) for every pair of disjoint sets, but we shall have it for 
any pair that can come up in practice. ‘The approach of Lebesgue is perhaps 


Henri Lebesgue 


294 


13/integration 


DEFINITION 
2.1 


Exercise 1 


Exercise 2 


somewhat more technical and harder to grasp in the beginning, but in the end 
it turns out to be simpler and much more powerful. 


LEBESGUE MEASURE 


A closed rectangle in R® is a set of the form 

Q = {x:ai Sei Sb, 1 = ie o 4 6 nh, 
where of course a; < 6; The corresponding open rectangles are defined simi- 
larly. The center of the rectangle is the point a + 6/2, the side lengths are the 


numbers 6; — ai, and the volume v(Q) is the product of the side lengths. The 
term rectangle by itself refers to either a closed rectangle or an open rectangle. 


The Lebesgue outer measure of a set A C R® 1s the number 
u(A) = |A| = inf Do( Qu), (1) 


where {Q,} is a sequence of rectangles covering A and the inf is taken over all 
such sequences. 


The definition gives the same result if the rectangles are required to be either 
open or closed. [Hint: To get open ones, replace Q; by a Q, with the same 
center and just slightly larger side lengths so that v(Qz) < v(Q) + €/2*.] 


For any set A C R® and any point a € R’, we have |A + a| = | Al. 


The first step in showing that the Lebesgue measure is like a volume is to 
show that |Q| = 2(Q) for any rectangle Q. It is plain from the definition that 
|Q| < v(Q), for there is always the covering with Q: = Q and v(Q:) < ¢/2* for 
k > 1. Ingetting the reverse inequality it is convenient to use special coverings. 
For each 6 > 0, let £5 be the lattice of points in R” with coordinates of the form 
m6, m an integer, and let Us be the set of closed cubes of side length 6 with 
vertices in £3. The point of using these cubes is that they fit together nicely 
instead of at random (Figure 1). This is expressed by the following lemma, 


instead of 


Figure 1 


LEMMA 
Does 


Exercise 3 


LEMMA 
2.3 


Proof 


THEOREM 
2.4 


Proof 


Exercise 4 


Exercise 5 


Lebesgue measure 295 


Tf Q ts a closed rectangle with vertices in £5, then Q is a finite union of cubes 
in Us and the volume is the sum of the volumes. 


Try to write out a proof of this obvious lemma. Do first dimensions 1, 2, and 3, 
and then perhaps try induction on the dimension and on the length of the 
shortest side. 


Tf Q is any rectangle, let Q® be the union of the cubes in Us that meet Q. 
Then Q? is a rectangle, and for any givene > 0 we have v(Q®) < v(Q) + € 
if 6 is small enough. 


If Q = {x:a; < x; < b;}, let c; be the largest number of the form mé that 
is <a;, and let d; be the smallest one that is >b;. Then Q® = {x:c; < 
x; <d;}. The statement on the volumes comes from the fact that = 
¢; S bj — a; + 26. (Note that the 6 here depends on Q as well as on e.) 


Tf Q is a rectangle, then |Q| = v(Q). 


Let Q be a closed rectangle, and let ¢ > 0 be given. According to the 
definition and Exercise 1, there is a sequence {Q;} of open rectangles 
covering Q with 


|Q| = 2v(Q.) — «. (2) 


Since Q is closed (hence compact) and the Q, are open, a finite number of 
the Q, cover Q. We can throw the rest away and suppose that the sum 
in (2) is finite, say with N terms. By Lemma 2.3 we can choose 6 small 
enough so that v(Q?) < 0(Q,) + €/N; hence 


|Q| = 2w(Q) — 2«. (3) 


It is plain that Q? is contained in the union of the Q}. Consequently, it is 
equal to the union of some of the cubes in Us; that make up the Q?. 
So Lemma 2.2 gives 


v(Q) < Zo(Q); 
hence v(Q) < 0(Q*) < |Q| + 2, which proves the theorem because € is 
arbitrary and we know already that |Q| < v(Q). 


We have proved the theorem for a closed rectangle. What about an open one? 


For fixed 6 > 0, let us(A) be the number given by Definition 2.1’ when the 
rectangles Q; are required to be closed (or open) cubes of diameter <5. Show 
that u3(A) = |A|. [Hint: It is plain that |A| < u3(A). To go the other way, 


296 


13/integration 


Exercise 6 


Exercise 7 


THEOREM 
2.5 


Proof 


suppose that |A| < , let e > 0 be given, and choose a covering {Q;} so that 
|A| > Zv(Qx) — €. 


For each & use Lemma 2.3 to find a rectangle Q; that is a finite union of cubes 
of diameter <6 and satisfies v(Qx) < v(Qz) + €/2*. Arrange all these little 
cubes in a single sequence by counting off first the ones that make up Q,, then 
the ones that make up Qs, and so on, and use Lemma 2.2.] 


If A and B are any two sets, then 
|AU Bl < [4] + [BI, 
and if A and B are at a positive distance apart, then 
|AU Bl = [Al + [BI 


[Hint: The first part is easy from the definition. The second part follows from 
the first and from Exercise 4 if you take 6 smaller than the distance between A 
and B. Recall that the distance is defined by d(A, B) = inf {d(x, y):x € A and 
y © B}.] 


For any set A, |A| = inf |G|, where the inf is taken over all open sets G D A. 
(Hint: Use the definition and Exercise 1.) 


The Lebesgue outer measure satisfies the following conditions: 
(a) || = 0 (where & is the empty set). 
(b) If A CB, then |A| < |B]. 
(c) If A = UZ, Aj, then |A| < 2|Aj]. 


Parts (a) and (b) are perfectly obvious, but part (c) requires some proof. 
It can be assumed that each |A,| is finite, for otherwise the inequality is 
automatic. Let « > 0 be given, and for each j choose a sequence {Qi} 
covering A; and satisfying 


» v(Qi) < lA] + oF 


k 


Yin@ <i (alts) = lad + (4) 
ik 5 j 


When & and j both vary, {Qi} is a sequence of rectangles covering A, so 
|A| is at most the left side of (4), which proves the theorem because e€ is 
arbitrary. 


therefore, 


a 


Remark 


Exercise 8 


Exercise 9 


Exercise 10 


THEOREM 
2.6 


Proof 


Lebesgue measure 297 


In the proof above we have used the fact that the double sequence {Qi} can be 
arranged in an ordinary sequence. This can be done by thinking of {Q%} as an 
infinite matrix and counting off the terms as they are met along the following 
path. 


1 1 
1 2 
1 2 
1 2 
1 a 


We have also used the fact that the sum of the corresponding series with terms 


1Q}| is equal to the sum 
> (, lel). 
J k 


Prove that this is so for any arrangement of the Qj in an ordinary sequence by 
showing that in both cases the sum is the least upper bound of all finite sums of 
the |Q;|. The point here is that the numbers |Q?| are nonnegative. 


A set X is said to be countable if its points can be counted off, that is, arranged 
in a finite or infinite sequence. Show that any subset of a countable set is 
countable and use the argument of the Remark to show that if X and Y are 
countable, then ¥ X Yiscountable. Show that a countable union of countable 
sets is countable. 


Show that the rational numbers are countable, and more generally that the 
points in R” with rational coordinates are countable. 


A countable union of sets of measure 0 is again a set of measure 0. (In particu- 
lar, every countable set has measure 0, so no rectangle in R” is countable.) 


We shall close the section with some theorems to show that certain kinds 
of sets must have measure 0. Some of these have immediate interest and others 
will be useful later on. When dealing with various spaces R*, we shall write 
| |, to display the dimension. The following result is often useful. 


Let A C R” be any set and p:A— R” be any function. If each point 
a € A has a neighborhood G such that |p(A \\ G)|n = 0, then EN = 0: 


The hypothesis implies that for each point a € A there is an open ball B 
with rational center and radius such that a € B and |¢(4 1M B)|, = 0, 


298 


13/integration 


THEOREM 
Dell 


Proof 


for the neighborhood G must contain such a ball. By Exercises 8 and 9 
the family ® of such balls is countable. Thus, A is the countable union 
of the sets A ~\ B with B € @; hence ¢(A) is the countable union of the 
sets (A (\ B), each of which has measure 0. By Exercise 10, g(A) has 
measure 0. 


Note that when ¢ is the identity function, the theorem says that if A has 


measure 0 “‘locally,’’ then A has measure 0. 


If g:R" > R* satisfies |p(x) — o(y)| < Me|x — y|?, 0 = m/n, ona set 
A C R”, then 

(a) If p > m/n, then \p(A)|, = 0. 

(b) If p = m/n, then \p(A)|\n < 2°(M V m)™|A|me 


Since the measures can be defined by cubes (Exercise 5), it is convenient 
to use the metric ||x|| = max|x;| in which the ball with center a and radius 
r is just the cube Q(a; 2r) with center a and side length 2r. Since ||x|| < 


Ix] < Vm ||xl] in R” and |]x|| < |x| < Wn |[x|| in R*, we have 
lle) — eG) < Wt Vm)p||x — gle. (5) 


We shall show that this implies that for any cube Q with side length 
<e <1 we have 


lo(A A Q)|n < 2M Vm) 2#€"*—™| Qn (6) 


Let Q = Q(b;7r) with r <«. If Q does not meet A, there is nothing to 
prove because the left side of (6) is 0. On the other hand, if ais a fixed 
point of A) Q and x is an arbitrary point of AM Q, then ||x — 6|| < 1/2 
and |la — b|| < 1/2, so ||x — all <7; hence ||y(x) — ¢(@)|| < (M@ V m)er?, 
which means that 


e(AD Q) C Q(e@; 20M Vm)*re). 


It follows that 


\o(AA Q)\n S 2M Vin) rerne < 2"(M Vm) rveremne™, 


which is exactly (6). 
Now, if {Q,} is any covering of A by cubes of side length <e, then 
(6) gives 


lo(A)ln < ZIQ(AM Qi) ln < 27M Vm) "6" Z| Qu lm 


Exercise 11 


COROLLARY 
2.8 


Proof 


COROLLARY 
2.9 


Proof 


COROLLARY 
2.10 


Proof 


Lebesgue measure 299 


According to Exercise 5, the measure of A is the inf of the sums on the 
right, so we have 


\o(A)|n < 20M Vn) "re" | A 


If p > m/n, this gives (a) because € is arbitrary, while if p = m/n it gives (b). 


What about the last statement of the proof when |A|m = 2? 


The theorem has a number of interesting corollaries. 


The notion of measure O is independent of the coordinate system in R”. Every 
plane of dimension m <n has measure 0. 


If T is a linear transformation, then |Tx| < M|x|, so if m <n, then the 
hypothesis of the theorem is satisfied for ¢ = T with p=1. If Visa 
subspace of R* of dimension m < n, then V = T(R™), and part (a) of the 
theorem shows that |V| = 0. If II is a plane of dimension m < n, then 
Il = V +a, and Exercise 2 shows that |II| = 0. If » is the Lebesgue 
measure relative to some other coordinates and T is the linear trans- 
formation that changes coordinates, then »(A) = |T(A)|, and part (b) 
shows that if |A] = 0, then »(A) = 0. 


Let e:R™ — R* be of class C' at each point of the set A. If m <n, then 
le(A)| = 0. In particular, the measure of any smooth manifold M C R* 
of dimension m < nis 0. 


From Theorem 5.3 of Chapter 10 we know that for each point a € A 
there is a ball B(a; r) such that 


lee) — 0)| S$ (lde@]_ + Dlx—- | for x,y € BG; n), 


so part (a) of the theorem shows that |e(A A BG; 7)| = 0. Then 
Theorem 2.6 shows that |y(4)| = 0. As for the smooth manifolds, if 
b € M and ¢gisa local parametric representation at 6 with g(a) = b, then 
what has been proved shows that |o(BG; r))| = 0, whereas ¢(B(a; r)) 
contains a neighborhood of b in M. Thus, M has measure 0 locally; so 
by Theorem 2.6, M has measure 0. 


Let p:R" — R” be of class C' at each point of the set A. If |A| == 0, then 
|e(A)| = 0. 


The proof is the same as the one above, except that we use part (b) of 
Theorem 2.7 instead of part (a). 


300 


13/tntegration 


Exercise 12 


Exercise 13 


Exercise 14 


3 


Corollary 2.9 is a good example of how measure theory can be used to prove 
interesting results that have nothing to do with measure theory. We have 
mentioned, for example, that there are paths that fill a cube in R”. Corollary 
2.9 shows that no path of class C! can fill a cube in R*, n > 1, for a cube has 
positive measure. It also shows that if m <n there is no possibility of an 
inverse-function theorem for functions ¢:R™ — R” of class C’; for every open 
set in R” contains a cube and hence has positive measure, while g(R™) has 
measure 0. (Previously, we knew that there could not be a differentiable 
inverse, but this shows that the range cannot even contain any open set.) 


Let f:R™ — R” be of class C’, p > m/n, at each point of a set A. If the partial 
derivatives D;f all vanish on A for 1 < |i| < p — 1, then |f(A)| = 0. [Hunt: 
It is enough to show that for each point a € A there is a ball B = B(a;r) such 
that 


If) —fO)1< Mx — yl? forxy CAMB. (7) 


Choose B so that the partial derivatives of order p exist and are bounded on B, 
and then write Taylor’s formula at y for each of the coordinate functions. 
Inequality (7) will result.] 


Let 7:R"— R” be the linear transformation given by Te; = die, where ¢1, 
. , én is the usual basis of R". For any set A C R” we have 


|7(A)| = [a+ + > Dal [Al 
(Hint: If Q is the rectangle with center a and side lengths 51, . . . , 52, then 
T(Q) is the rectangle with center Ta and side lengths |Ails1, . . . , [Anlsx. 


Cover A by rectangles to get 
|T(A)| < [Ar - +» An! [Al 
If some }; is 0 you are done, and if not you can apply the same thing to T~'.] 


|B(a; r)| = cr”, with c = |B(O;1)|. (Hint: Exercise 13 with Tx = rx and 
Exercise 2.) 


OUTER MEASURES 


Just as it was convenient to study continuity in the abstract setting of metric 
spaces, it is also convenient to study measure and integration in an abstract 
setting. The abstract setting is much simpler than R*, because it involves only 
three simple axioms, and it also has other interesting interpretations. 


DEFINITION 
Boll 


DEFINITION 
3.2 


Exercise 1 


THEOREM 
3.3 


Proof 


ouler measures 301 


An outer measure on a set X ts a function p from the subsets of X to the non- 
negative real numbers and + 0 with the three properties 

(a) u(@) = 0. 

(b) If A CB, then (A) < p(B). 

(c) If A = Uys Ab, then w(A) < Zp(Ay). 


These are the properties listed in Theorem 2.5 for the Lebesgue outer measure. 
The number y(A) is called the outer measure, or more often just the measure, of 
the set A. Another interesting example of an outer measure on an arbitrary 
set X is obtained by putting »(A) equal to the number of points in A. This 
one is called the counting measure. When X is the positive integers, the resulting 
theory of integration is the theory of absolutely convergent series. Still another 
important example is obtained by choosing any function g: ¥ > R” and putting 
u(A) = |p(A)|. 

If we think of the outer measure as a generalization of length, area, or 
volume (which it is in the Lebesgue case), we would like to see results to the 
effect that if A and B are disjoint sets, then n(A U B) = w(A) + p(B). It is 
plain from (c¢) that 


u(AU B) < w(A) + u(B), (1) 


but the opposite inequality is simply false when A and B are completely arbi- 
trary. The first big job is to pick out a wide class of sets for which it is true. 
The definition is more technical than intuitive. It will simply have to be 
accepted for the sake of what can be done with it. 


The set A © X ts p-measurable if for every set SC X we have 
u(S) = nS A) + a(S — A). (2) 


Every set of measure 0 is measurable. 


If A and B are measurable, then A\) B, A(\ B, and X — A are measurable. 


That X — A is measurable is clear, because the definition is symmetric 
in A and X — A. Let us look at A) B. First split the arbitrary set S$ 
into the part in A and the part not in A, and use the fact that A is mea- 
surable to get (2). Now split each of the sets S\ A and S — A into the 
part in B and the part not in B, and use the fact that B is measurable 
to get ; 


Pout S(\ 48) u((Sir A) — 8), 3) 
u(S — A) = w((S — A) B) + w((S — A) — B). | 


302 


13/integration 


THEOREM 
3.4 


Proof 


Since,.S — (A018) = 1(S1 0A) = BIOS = eye Ss — 


we have 
u(S — (AM B)) <u((SA A) — B) + o((S— 4) OB) +S — 4 B), 
which, together with (2) and (3), gives 

u(S) > w(SO AM B) + p(S — (AN B)). 


The opposite inequality is always true, so A\ B is measurable. AU B 
can be treated in the same way, or by noticing that 


MAW B) =X = AI Xe — _B) 


and using what has been proved for complements and intersections. 


Let. {Ax} be a sequence of disjoint measurable sets with union A. Then for 
any set S, 


u(S) = Dus Ay) + u(S — A). (4) 


If B, is the union of the first n A;’s, then 
SOB, = (Sit \ An) So (SOB) 


and this is exactly the decomposition of § B, into the part in A, and 
the part notin A,. Therefore, 


SN By) = pS’ Ay ESO) Bea) 


and induction gives 


n 


SO By) = Y u(SO Ad. 


k=1 


Now, B, is measurable, by Theorem 3.3, so 


w(S) = w(SA By) + u(S — By) = Y w(SO Ax) + u(S = A). 
he 


1 


[Where does »(S — B,) > u(S — A) come from?] Since this holds for 
every n, we have 


n(s) >) wSA Ad) + a(S - A), 
k=1 


and, as usual, the opposite inequality is automatic. 


THEOREM 
3.5 


Proof 


THEOREM 
3.6 


Proof 


Exercise 2 


Exercise 3 


Exercise 4 


outer measures 303 


Tf {Ax} 1s a sequence of disjoint measurable sets with union A, then A ts mea- 
surable and 


u(A) = Zy(A;). (5) 


To get (5) take A = S in formula (4). To get the measurability note 
that the sum in (4) is at least u(S A), so (4) gives u(S) > u(SM A) + 
u(S — A). 


The union and intersection of a sequence of measurable sets are measurable. 


k-1 
Let By = Aj, and B, = A, — U8; Each B; is measurable, by Theorem 
= 


3.3 and induction. ‘Therefore, the union of the B, is measurable. But 
the union of the A; is the same as the union of the B,. As for the inter- 
section, (/\A, = X — U(X — A,). 


If {A,} is an increasing sequence of measurable sets with union A, then 


(Ax) — p(A). [Hint: Put Ap = Dm, and write A, = an (A, = Axi) ll 


If {A} is a decreasing sequence of measurable sets with intersection A, then 
u(A;.) — w(A) provided at least one »(A;) is finite. Show the necessity of the 
proviso. 


Let uw be the counting measure on X. Every subset of XY is u-measurable. 


In Section 4 we shall show that with the Lebesgue measure on R® all the 
open and closed sets are measurable. Combined with the theorems of this 
section, this means that every set that can be reached in any kind of construc- 
tive way is Lebesgue measurable. Nonmeasurable sets simply do not come up 
in practice. Nonconstructive examples can be given, but we prefer to skip 
this and to give instead an example of another natural outer measure on R* 
which plays an important auxiliary role in the theory of surface area and which 
does have simple nonmeasurable sets. 

Consider the problem of defining the length of a set in the plane. One 
idea is to cover the set with small circles and to take the sum of the diameters 
of the circles (Figure 2). If we take the sum of the squares of the diameters, 
then we get effectively back to the Lebesgue measure, for the area of a circle 
is just rd?/4. This suggests that to get the area of a set in R? we might cover 
with small balls and take the sum of the squares of the diameters. It suggests 
in general that to get the m-dimensional area of a set in R® we might cover with 
small balls and take the sum of the mth powers of the diameters. However, 


304 13/integration 


DEFINITION 
3.7 


Exercise 5 


Exercise 6 


Exercise 7 


Figure 2 


it is technically convenient to allow coverings by sets of any kind. The formal 
definition is as follows: 


Let X be any metric space and let m be any positive number. For eache > 0 
define 
ws, (A) = inf 25(Ax)”, 


where 5(Ax) is the diameter of Ax, and {Ax} is any sequence of sets of diameter 
<e covering A. 


Show that uf, is an outer measure on X. 


Let Y be a subset of X, and let »§, be the outer measure on Y constructed above 
by considering Y as a metric space on its own. Show that D(A) a, (A ato 
every 4 CY. (This would obviously be false if we insisted on covering by 
balls.) 


On R! the measure p{ coincides with the Lebesgue measure. Consequently 
(Exercise 6), for any line segment J in R®, u{(Z) is the length of J. 


The only trouble with the measures y{ is that they have practically no 
measurable sets. Consider, for example, two line segments J and J in R’, 
n > 1, with lengths between ¢/2 and € and with a common midpoint. Since 
i Oo has diameter <enwe have 


wi US) <e < wi) + ui), 


in spite of the fact that w,(Z / J) = 0, which shows that the segments [ and 
J are not measurable. The crucial point here is that J and J have length <e. 
If we fix J and J and let e— 0, the trouble disappears. 


DEFINITION 
3.8 


Exercise 8 


Exercise 9 


Exercise 10 


Exercise 11 


Exercise 12 


Remark 


measurability in R” 305 


Let X be a metric space and let m be a positive number. The m dimensional 
Hausdorff measure on X is the outer measure defined by 


Um(A) = sup ,,(A) = lim y,,(A). 
e>0 «0 


(When e decreases, the admissible sequences in Definition 3.7 get fewer and 
the inf becomes greater. This is why the sup in Definition 3.8 is a limit.) 


Show that yu» is an outer measure on X and prove the analog of Exercise 6. 


Show that if the sets A and B are at a positive distance apart, then pm(A U B) = 
tm(A) + um(B). (The main theorem of Section 4 is that this is precisely the 
condition for every closed set to be measurable.) 


If um(A) < ©, then u,(A) = 0 for every n > m. [Hint: Show that uS(A) < 
emmy, (A). 


For the measure un, on R” there are positive constants ¢; and cz such that for 
Pp 


every set A 
es|A| < wn(A) < co Al. 


(Later we shall show that ya(A) = clA{.] 


The Hausdorff measures will play the fundamental role in the theory of 
surface area in Chapter 15. They are also useful examples to have in mind 
during the development of the abstract theory, for they have some bad as well 
as some good properties. For instance. 


If m <n, then no set of positive Lebesgue measure in R® is the union of a 
sequence of sets of finite 4» measure. (Hint: Use Exercises 10 and 11.) 


The oddest example in the realm of Lebesgue nonmeasurability was discovered 
in the 1920s by Stefan Banach and Alfred Tarski. They showed how to cut 
a ball in R® into a finite number of pieces and then reassemble the pieces into 
a larger ball. 


MEASURABILITY IN R” 


The purpose of the section is to show that the Lebesgue measurable sets in 
R” are exactly the sets that can be approximated by open or closed sets. The 
first step is to show that the open and closed sets themselves are measurable. 
It is enough, of course, to consider the closed sets, for every open set is the com- 
plement of a closed set. 


306 13/integration 


THEOREM 
4.1 


THEOREM 
4.2 


Proof 


If w is the Lebesgue measure on R” (or any Hausdorff measure fm as in 
Definition 3.8), then every closed set is » measurable. 


This theorem is a consequence of an abstract theorem, whose hypothesis 


is taken care of by Exercise 6 of Section 2 in the case of the Lebesgue measure 
and by Exercise 9 of the last section in the case of the Hausdorff measures. 


Let ys be an outer measure on a metric space X. The necessary and sufficient 
condition that every closed set be measurable is that p(A \U B) = w(A) + 
u(B) whenever A and B are a positive distance apart. 


One half iseasy. If A and B are a positive distance apart, then 
Af\ B= 2, so 


ASB A=— 4 Vand (A) By AaB 


If every closed set is measurable, then in particular A is measurable, and 
we have p(A VU B) = w(A) + u(B) just by using the definition of mea- 
surability on the set S = AU B. 

In proving the other half we have to show that if A is closed and S$ 
is arbitrary, then 


PCS) IO ea) Gs iG vel, 


We can suppose that u(S) < ©, for otherwise the inequality is obvious. 
The main point is to prove that if 


C= [x d(x, A) > “|, 


then 


Note that the G, are increasing and have union X — A, so 


S—A-—Gra= \V SOG — Gr; 
k=n+1 
hence 


co 


WS-A-G)< ) WSO G - Ge). (2) 
k=n+1 


Therefore, it will suffice to show that 


co 


> usr (Gx = Gi-1)) < mw, (3) 
k=1 


The picture looks as shown in Figure 3. 


THEOREM 
4.3 


measurability in R” 307 


Figure 3 


To prove (3) we look separately at the terms with & odd and those 
with & even, for the reason that the sets involved will then be at a positive 
distance from one another. Doing so we get 
ul(SV0 (Ge — Ges)) = # CLI SO (G — Gea) < 26S). 

k<n 


k odd k odd 


The same is true for even k, so every partial sum in the series (3) is at 
most 2u(S), and the series converges. 
Now we have established (1), and we shall use it to prove the theorem. 


Since S(\ Gy = (S — A) (\ Gn, we have 
POS OME SS RS 2S ONE) ee = A) Ge 


hence 


BUS’ Gi) a(S =A). 
Since S(\ A and S'\ G, are at a positive distance, this gives 
u(S) > eS A) + WSO Gr) > w(S OV A) + nS — A), 


and the theorem is proved. 


From this theorem and Theorem 3.5 on sequences it follows that the union 


of any sequence of closed sets is measurable. Such a set is called an F, (F 
standing for closed and o for union). Similarly, the intersection of any sequence 
of open sets is measurable. Such a set is called a G; (G for open and 6 for 
intersection). These sets, plus and minus sets of measure 0, make up all the 
Lebesgue measurable sets. 


A set A C R® ts Lebesgue measurable if and only if it has the equivalent 
properties: 

(a) For each € > 0 there is an open G D A with |G — Al <e. 

(b) A is a Gs minus a set of measure 0. 


308 


13/integration 


Proof 


THEOREM 
4.4 


Exercise 1 


Exercise 2 


Exercise 3 


THEOREM 
4.5 


Proof 


Remark 


To see that (a) implies (b), take G, D A with |G — A| < 1/k, and let E 
be the intersection of the G,. This is clearly a Gs; that contains A and 
satisfies |E — A| = 0. 

We have just seen that every Gs is measurable, and we know that 
every set of measure 0 is measurable, so every Gs minus a set of measure 
0 is measurable. 

Now let A be measurable and write A as the union of a sequence 
{A,}, each measurable and with finite measure [for example, A; is the 
intersection of A with the ball B(0; 4)]. Ife > 0 is given, use Exercise 
7 of Section 2 to find an open G, D A, with |Gi| < |Az| + €/2*. Since 
A, is measurable, it follows that 


€ 
IG. — Ax] = |Gi| — |Ax] < oF 


If G is the union of the G;, then G is open, G > A, and 


A set A C R® is Lebesgue measurable if and only if it has the equivalent 
properties: 

(a) For each « > O there is a closed FC A with |A — Fl <. 

(b) A is an F, plus a set of measure 0. 


Prove the theorem by taking complements in Theorem 4.3, 


Every F, is a Kz, that is, is the union of a sequence of compact sets. [Hint: 
Write the set as the union of an increasing sequence of closed sets #; and put 


ig BO Vell 
What about Theorems 4.3 and 4.4 for the Hausdorff measures pm on R"? 


Let o:R*— R® be of class C' at each point of a set A. If A is Lebesgue 
measurable, then ¢(A) is Lebesgue measurable. 


Use Theorem 4.4 and Exercise 2 to write A as the union of a set N of 

measure 0 and a sequence {K;} of compact sets. Each g(K;) is measur- 

able because it is compact, and ¢(/V) is measurable because it has measure 
0 (Corollary 2.10). 


If ¢ is just continuous, it still carries compact sets into compact sets, so the 
validity of the above proof depends on whether it carries sets of measure 0 into 


2 


DEFINITION 
5.1 


Exercise 1 


Exercise 2 


Exercise 3 


THEOREM 
Sol 


measurable functeons 309 


sets of measure 0. It can be shown that there exist continuous functions that 
do not carry sets of measure 0 into sets of measure 0. When ¢ is such a func- 
tion, there always exist measurable sets A such that ¢(A) is not measurable. 


MEASURABLE FUNCTIONS 


Throughout the section » is an outer measure on a set X. 


A function f : X — R" ts measurable if f-1(G) ts measurable for every open set 
Ge BR 


There is a technical advantage in allowing functions f that are defined on 
a subset of X rather than the whole set. In order not to multiply the notations, 
we shall still write f:Y — R” in this case. Observe that if f is measurable, then 
the set D on which it is defined is necessarily a measurable set, for D = f—1(R*), 
and R” is open in R”. 


A function f:X — R” is measurable if and only if f~'(F) is measurable for every 
closed set F C R”. 


A function f:X— R®” is measurable if and only if f-1(Q) is measurable for 
every open cube Q C R*. (Hint: Every open set is the union of a sequence 
of cubes.) 


Note the analogy between measurability and continuity. If X is a metric 
space, then f:¥ — Rt” is continuous if and only if f~'(G) is open for every open 


GC R*. 


Prove this statement if you have not already done so. (In this case it is assumed 
that f is defined on all of X, not just on a subset.) 


From the above characterization of continuity and Theorem 4.2 we get 
the following theorem. 


If X ts a metric space in which u(A UV B) = u(A) + u(B) whenever A and 
B are a positive distance apart, then every continuous f : X — R” is measurable. 
In particular, every continuous f:R™ — R” is Lebesgue measurable. 


The measurable functions will turn out to be the ones suitable for integra- 
tion. We shall show presently that they form a much larger class than the con- 
tinuous functions (which are essentially the oncs suitable for Riemann integra- 
tion), but some other results come first. 


310 


13/integration 


THEOREM 
5.3 


Proof 


THEOREM 
5.4 


Proof 


THEOREM 
5.5 


Proof 


Exercise 4 


THEOREM 
5.6 


Proof 


df ¢ : : ; 
Let X > R" > R”, where f is measurable and gy 1s continuous. Then 
y of is measurable. 


If G is an open set in R”, then G,; = g 1(G) is open in R”*; hence 
(go f)-'(G) = f-\(Gi) is measurable. 


f:X > R* ts measurable if and only if each coordinate function 1s measurable. 


Let gx(y) = yx. If f is measurable, then f, = of is measurable by 
Theorem 5.3. Suppose, on the other hand, that each f; is measurable. 
If Q is the open cube 


Q— IyG Ra. < y =< o, fork = 1, .. 2 yn}, 


then f(x) belongs to Q if and only if f,(x) belongs to , = (ax, bx) for each k. 
Thus, 


PO ye Ue): 


Each set on the right is measurable; therefore, so is the intersection. 
Exercise 2 finishes the proof. 


Let f, g:X —> R" be measurable. Then f + g, af, (f, g), and |f| are all 


measurable. 


Write (f, g) for the function from X to R* whose first n coordinates are 
those of f and lastn those ofg. Now, f +g = 9° (f, g), where e((x,y)) = 
x+y. From Theorem 5.4 it follows that (f, g) is measurable, and it is 
plain that g:R2"— Rt” is continuous, so Theorem 5.3 shows that f + g 
is measurable. 


Carry out the details in the other three cases. 


If f: X — R!, then the following are equivalent. 

(a) f ts measurable. 

(b) {x:f(x) < a} is measurable for every real a. 

(c) {x:f(x) < a} is measurable for every real a. 
(d) {x:f(~) > a} ts measurable for every real a. 
(e) {x:f(x) > a} ts measurable for every real a. 


Since the intervals 


Hg) and aa eee, 


THEOREM 
Soff 


Proof 


Exercise 5 


Exercise 6 


measurable functions Bia 


are open, and the intervals 7, and J, are closed, it follows that the inverse 
image of each of these is measurable if f is measurable. Consequently, 
(a) implies each of the others. Suppose, on the other hand, that (b) 
(for example) holds. We have Ja = (\Ia+tjn, 80 f—'Ua) = (\f-lettjn) 
is measurable, and, therefore, so is 


fe.) 


where D is the set on which f is defined. (Why is D measurable?) 
This shows that 


{xra < f(x) < BY = f(a) O fa) 


is measurable, and then Exercise 2 shows that f is measurable. The 
other conditions are handled similarly. 


Tf {fu} ts a sequence of measurable real-valued functions, then the following are 
all measurable: 


sup fk; nt fp, lim sup fz, lim inf fy. 


It is plain that f(x) = sup fe(x) is <a if and only if each f,(x) is <a. 
In other words, 


a Ce), 


and this is measurable, since each set on the right is measurable. The 
inf is handled similarly. To get the lim sup, note that g,(x) = 
sup {fk(x):k > n} is measurable by what has just been proved, and, 
therefore, so is lim sup fx = inf gn. 


Throughout the section we have been a little sloppy about the sets on 
which the various functions are defined. In Theorem 5.2, for example, the 
continuous function f is defined on the whole space X. In Theorem 5.3 the 
continuous function ¢ is defined on the whole space R®. In Theorem 5.7 the 
functions sup f, and lim sup f; are defined at a point x if and only if each fi 
is defined at x and the sequence {f,(x)} is bounded above. 


Look back at each of the theorems of the section to make sure that you under- 
stand the domains of the various functions. 


It would appear more natural to define the lim sup f; on a different set. Tell 
what the different set is and reprove the theorem. 


Be 


13/integration 


6 


DEFINITION 
6.1 


THEOREM 
6.2 


Proof 


THEOREM 
6.3 


Proof 


THEOREM 
6.4 


DEFINITION OF THE INTEGRAL 


A simple function on X is a nonnegative measurable function p: X — R} 
which ts defined everywhere and takes only a finite number of distinct values. 
If these are on, . . . , Qn, and E; = gy '({a;}), then 


I(g) = Zaju(&,). (1) 


Each £; is measurable, since the single point {a;} is a closed set. It may 
have infinite measure, of course, in which case the value of the sum in (1) is 
co—-with one exception. In integration theory the product 0: « is always 
taken to be 0. Thus, if one of the a; is 0, then the term a,u(E;) is 0, whether 
u(E;) is © or not. It will turn out, of course, that J(g) is the integral of 9, 
but we shall give a general definition of the integral that applies to nonsimple 
functions as well and then will show that the general definition gives back 
I(v) in the case of simple functions. First it is necessary to develop a few 
properties of J(¢). 


Let @ be simple. If X = UX, where the X, are measurable and disjoint 
and takes the constant value B, on X;,, then 


I(¢) = DB Xx). (2) 


The point is that the 8, may not be distinct. However, each f; is equal 
to some a;. If 5S, is the set of indices k for which Bx = a, So is the set 
of indices k for which 8, = a2, and so on, then 


Be) Xi hence ys(2)) = > u(X;). 
ie ES: 
It is clear that the sum (1) is obtained from the sum (2) by grouping the 
terms in this way. 


If g and w are simple, then I(g + ~) = I(v) + IC). 


Write Y = UX, where the X; are measurable and disjoint and ¢ takes 
the constant value a, on X;, and y takes the constant value 8, on X,. 
Then ¢ + y takes the constant value o, + 6, on X;, and Theorem Oz 
gives 
I(y =“ v) a (ay, an By) u (Xn) = Dory (Xx) => DBs (Xx) 
= I(y) + IY). 


If yg and W are simple and p < W except on a set of measure 0, then 
I(g) < I). 


Proof 


DEFINITION 
6.5 


DEFINITION 
6.6 


Exercise 1 


Exercise 2 


THEOREM 
6.7 


Proof 


THEOREM 
6.8 


Proof 


LEMMA 
6.9 


definition of the integral 313 


Suppose first that ¢ < W everywhere. Then in the notations of the last 
proof we have az < f; for each k, which obviously implies that [(¢) < J). 
Now suppose that ¢ < py except on some set N of measure 0. Let m be 
an upper bound for g, and let x be the function that is m on N and 0 
everywhere else. Since /(x) = mu(N) = 0, and since g < y + x every- 
where, we have 


TG) Nh eX) aie 1x) = 1G): 


A property of points in X ts said to hold almost everywhere, or a.e., if it 
holds except on a set of measure 0. 


In this terminology the statement of the last theorem is that if g and y 


are simple and » < yw a.e., then J(y) < J(y). 


If f: X — R' is nonnegative and measurable, then 
If du = sup I(e), (3) 


where the upper bound is taken over all simple ¢ that are <f at every point 
where f is defined. 


The definition remains the same if the upper bound is taken over all simple ¢ 
that are </ at almost every point where f is defined. 


If f is undefined on a set of positive measure, then it fdp= oo. 


If g is simple, then fe du = I(¢). 


Since ¢ < g, it follows that fe du > I(v). On the other hand, ify < 9g, 
then Theorem 6.4 gives [(w) < (gv). Therefore, the upper bound of J(), 
which is the integral of g, is also <J(¢). 


Let f and g be nonnegative and measurable and defined a.e. If f < g a.e., 
then [f du < fg du. 


If a simple ¢ is <f a.e., then it is also <g a.e., so by Exercise 1 it follows 
that I(¢) < fg du. Since this holds for every ¢, it also holds for the 
upper bound, which is the integral of f. 


Letf be nonnegative and measurable and defined a.e. There is a nondecreasing 
sequence of simple functions that converges tof a.e. 


oes 


13/integration 


Proof 


Exercise 3 


DEFINITION 
6.10 


Remark 1 


Exercise 4 


i 


It is no loss of generality to assume that f is defined everywhere, for we 
can simply define it to be 0 (for example) on the set where it is initially 
undefined. In this case we shall get a sequence of simple functions that 
converges everywhere to f. 


If f, = min(f,*), then the construction in Section 1 provides a 
simple g, satisfying 


ie) = : < gx(x) < fa(x) for all x. 


This is formula (3) in Section 1 with f; for f. and 1/k for e«. Now put 


vy, = max(¢1, son One 


It is evident that {,} does the job. 
If f happens to be bounded, then y¥;, — f uniformly. 


There is sometimes occasion to integrate over a subset of X rather than 
over the whole space. 


If f is a nonnegative measurable function and A is a measurable set, then 


ine dip | xaf du, 


where xa, the characteristic function of A, is the function with value 1 on A 
and value 0 elsewhere. 


Note that if g is a simple function taking the values a, . . . , am on the 
disjoint sets £1, . . . , Em, then xa¢g is the simple function taking the values 
O1, ..»,@%m onthe sets AMV A), ...,A\En. Consequently, 

m 
[pede =) amlAn B). (4) 


j=l 
It is usual in integration theory to define the “‘product” 0 X ? to be 0. 
In particular, the product x4/f is defined to be 0 at each point outside 4, 
whether f is defined there or not. 


If f and A are measurable, then xaf is measurable when defined in this new 
way. 
CONVERGENCE THEOREMS 


An advantage of the Lebesgue integral over the Riemann integral is the avail- 
ability of very powerful convergence theorems. The first (and the one from 
which the others come easily) is the following. 


THEOREM 
7.1 


LEMMA 
7.2 


Proof 


Proof of the Theorem 


convergence theorems B15 


(Monotone Convergence Theorem) Let {fi} be a nondecreasing sequence 
of nonnegative measurable functions with limit f. Then 


ff du = lim fe du. (1) 


The limit of the sequence of functions is to be taken in the strict sense: 
the limit f is defined at a point x if and only if each of the f;, is defined there 
and the limit of {fi(x)} exists. It may as well be assumed that each fy, is 
defined a.e., for otherwise both sides of (1) are automatically + «©. It is not 
assumed, however, that f is defined a.e., so the theorem gives a powerful 
method for proving that limits exist a.e.: If the limit of the integrals is finite, 
then the integral of f is finite, and so f must be defined a.e. The slick proof 
given here comes from W. Rudin. It begins with a lemma, which is in fact 
a special case of the theorem. 


If yg is simple and {Ax} is an increasing sequence of measurable sets with 
union A, then 


i go du = lim ee g du. 


Application of formula (4) of the last section gives 


[ed = y ayu(A O E5) (2) 
and 7 
[ea = ), amlde OB), (3) 


j=l 
Now, Exercise 2 of Section 3 does the job, for it shows that each term in 
the finite sum (3) converges to the corresponding term in (2). 


It is plain from Theorem 6.8 that 
ff du > lim [fr du. 


To prove the opposite inequality it is enough to show that if ¢ is simple 
and <f where f is defined, then 


fodu <lim ff, du. (4) 


As observed above, it can be assumed that each f; is defined a.e., 
in which case there is a set A with complement of measure 0 such that 
each f;, is defined everywhere on A and for every point x in A the sequence 
{fx(x)} is nondecreasing. If ¢ is a positive number less than 1, and if 


A, = {x © A:filx) > cv(x)}, 


316 13/ integration 


THEOREM 
7.3 


Proof 


THEOREM 
7.4 


Proof 


THEOREM 
7.9 


Exercise 1 


THEOREM 
7.6 


Exercise 2 


then {A,} is a nondecreasing sequence with union 4. The lemma gives 


cf edu = f oe dy = ox = lim foe du 
alin be fedy < lim ip Fils: 

Since this holds for every positive c less than 1, it proves formula (4) and 
hence the theorem. 
A second basic convergence theorem is the following. 
(Fatou’s Lemma) [If {fx} is a sequence of nonnegative measurable func- 
tions defined a.e., then 

f (im inf f,) du < lim inf Jf, du. (5) 


If gn(x) = inf {fi(x):k > n}, then {g,} is nondecreasing and its limit is 
lim inf f,. The monotone convergence theorem gives 


fim inf fz) du = lim fg, du = lim inf fen du < lim inf ffn du, 


the last inequality coming from the fact that gn < fn. 


If f and g are nonnegative and measurable, then 
f+ g) du = Sfdu+ Jeg du. 


It can be assumed that f and g are defined a.e., for otherwise both sides 
are +0. Use Lemma 6.9 to find nondecreasing sequences {¢} and 
{y,} of simple functions that converge to f and g a.e. The monotone 
convergence theorem gives 


SU +2) du = lim fe + We) du = lim fox du + lim fyi dy 
= ffdut fg du. 


Any series of nonnegative measurable functions can be integrated term by term. 


Prove the theorem. 


If f is nonnegative and measurable, and {E;,} is a disjoint sequence of measurable 


sets with union E, then 
[pf du 7 >» [o,f de 
k=1 


Prove the theorem. 


DEFINITION 
8.1 


Exercise 1 


DEFINITION 
8.2 


Exercise 2 


LEMMA 
8.3 


Proof 


integrable functions oi7 


There is still one more basic convergence theorem, called the dominated 
convergence theorem, that applies to functions that are not necessarily nonnegative. 
This is discussed in Section 8. 


INTEGRABLE FUNCTIONS 


So far we have considered the integration of nonnegative functions. Now we 
take up the rest. 


A function f: X — R” is integrable if it 1s measurable and Slsl du< @, 


f:X — Rv is integrable if and only if each coordinate function is integrable. 


The natural way to integrate a function with values in R® is to integrate 
each coordinate function separately. The natural way to integrate a real- 
valued function is to express it as a difference of nonnegative functions. If 
f:X — R!' is measurable, then so are the functions 


aes 
2 


ila 


f* = max(f, 0) = 5 


and. = = max(— 70) 


Both are nonnegative, and, moreover, 
= — fee 20 opel one (1) 
If f: X — Rl! ts integrable, then 
Sf du = Sft du — Sf dp. 


f is integrable if and only if both ft and f- are integrable. 


The first step is to show that f can be split into positive and negative parts 
in any reasonable way at all—not just in the way of formula (1). 


Tf f = g — h, where g and h are nonnegative and integrable, then f 1s 
integrable and 


Sf du = Sg du — Sh du. 


f is measurable, since both g and A are, and then it is integrable, since 
[fl <e+tah. The fact that ft —f- =f=g—Agivessft+A=f-+ 8; 
then Theorem 7.3 gives 

Jftdut Shdu = Sfidu t+ Je du 


from which the formula in the lemma follows directly. 


Bia 


13/integration 


DEFINITION 
8.4 


THEOREM 
8.5 


Proof 


Exercise 3 


Exercise 4 


THEOREM 
8.6 


Proof 


If f:X 3 R*, the integral of f is obtained by integrating each coordinate 
separately. That is, if fy is the kth coordinate of f and I, = Sf du, then 


[Vacin (Open 76... 


If f, g: X — R* are integrable and a is areal number, then f + g and af are 
integrable, and 


Sf+ 9) du = Sfdut Sedu,  S(of) du = off dy. 

It is clearly enough to treat each coordinate separately, that is, to treat 
the casen = 1. In this case we have f = ft — f- and g = gt — g-, and 
then f+ g = (ft + gt) — (fF +g), and Lemma 8.3 does the job. 


Why is Lemma 8.3 necessary here rather than just the original Definition 8.2? 


Integrability and the value of the integral of an f:¥ — R” are both independent 
of the coordinate system used in R”. 


The third basic convergence theorem is the following. 


(Dominated Convergence Theorem) For each k, let fy: X > R” be 
integrable and satisfy |fu(x)| < g(x) a.¢., where g: X — R' is some fixed 
integrable function. If fi. — f a.e., then f is integrable, and 


She du— i dp. 


The function g is said to dominate the sequence {f,}. For this reason the 


theorem is called the dominated convergence theorem. 


It is plain that f is integrable, for it satisfies |f| < ga.e. In proving the 
convergence, it is enough to treat each coordinate separately, that is, to 
treat the casen = 1. The advantage of this is to make available Fatou’s 
lemma for use with the nonnegative functions g + fx. It gives 


fgdut Sfdu = f(g +f) du < lim inf f(g + fr) du 
= fg du + lim inf ff, du; 
hence 
ffdp < lim inf ff, du. (2) 
If this formula is applied to the function —f and the sequence {—fz} 
_(which, of course, satisfy the conditions of the theorem), it gives 


i! —fdu<lim inf f — fxd = —lim sup She dp; 
hence 


Sf du = lim sup She du. (e) 


integrable functions 319 
Since lim inf is always <lim sup, it follows from (2) and (3) that the two 
are equal and are equal to the integral of f. This implies that the limit 


exists and is equal to the integral of f. 


Exercise 5 Prove the fact used above—that 


lim inf(—a,) = —lim sup ax. 
THEOREM Let X = [a, b] be a closed bounded interval with the Lebesgue measure. A 
8.7 function f: X — R* is Riemann integrable if and only if it is bounded and 


continuous a.e.; if this is the case, the Riemann and Lebesgue integrals are equal. 


Proof Suppose that f is bounded, say |f| < M4, and continuous ae. If p is a 
partition, let g, be the function that takes the constant value f(£:) on the 
interval [x;-1, x;). Itis obvious that ¢, is measurable, that |y,| < M, and 
that 


SCs p) = Jen du. (4) 


(yp is almost a simple function—indeed, it is one if f is real valued and 
nonnegative.) Now, let {p,} be any sequence of partitions with |p,|— 0. 
It is plain that ¢»,(x) — f(x) for every x at which f is continuous, hence 
for almost every x. The dominated convergence theorem gives 


SCf; pe) = Jen, du— Jf du, 


which shows that the Riemann integral exists and is equal to the Lebesgue 
integral. 

As for the converse, it has already been seen that a Riemann integrable 
function must be bounded, so what remains is to show that it must be 
continuous a.e. Since the coordinate functions can be treated separately, 
it is all right to suppose that f is real valued. Let D be the set of dis- 
continuities, and let 


Dee liam suey (yp) — linet iy) alt (5) 
x n 


ys x yo 


Exercise 6 Strictly speaking, the limits superior and inferior have not been defined in 
exactly this situation. Define them and show that 


Daa (6 


If D has positive measure, then at least one D, must have positive measure. 
(Does this assertion require that the D,, be measurable? Are they measurable)? 


320 


13/integration 


Example 


Exercise 7 


Choose such an n and fix it. We shall show that for every partition p, 


5p) — sii) =“, 


which will show that f is not Riemann sneak 
Let 4, . . . , J» be the intervals in p that meet D, in at least one interior 
point (of the interval, that is). These cover D, except possibly for the 
finite number of end points of the intervals in . Therefore, 

> UG) = ur). 

j=l 
Moreover, if M; and m; are the upper and lower bounds of f on J;, then 
M; — m; = 1/n. It follows that 


5s p) — Ss) = ps (at, — myncey > 22. 


Calculate fixe ae 


If a > 0, then x* is Riemann integrable, and the old formulas with primi- 
tives give the value (1/a+ 1). Suppose, however, that a < 0, in which case 
x* is not Riemann integrable, because it is not bounded. Define 


1 1 
jn) =x ux 2 fey =O ax< = 
n n 


Now, fn is Riemann integrable, and 


t : i! 1 
je Bs i 
ie . Le ‘ al a 


This converges if and only if a > —1; then the monotone convergence theorem 
shows that x* is integrable on [0, 1] if and only ifa > —1. If this is the case, 
the value of the integral is 1/(a + 1). 

An important function, called: the gamma function, is defined by 


T(x) = ie Plea for <0: 


Show that t?~1e is integrable on [0, ©) if x > 0. Show that T(1) = 1 and 
T(x + 1) = xT (x). 


The exercise shows that if n is a positive integer, then '(n + 1) = n!. 


Exercise 8 


THEOREM 
8.8 


Proof 


COROLLARY 
8.9 


Proof 


9 


product measures B20 


If f is nonnegative and improperly Riemann integrable, then it is Lebesgue 
integrable and the two integrals are equal. However, the function sin x/x is 
improperly Riemann integrable on [0, ©) and is not Lebesgue integrable. 


If f: X — R* ts integrable, then 


[Sf dul < Sif| du, 


and equality holds if and only tf f(x) = |f(x)|v a.e. for some constant vector v. 


The theorem is obvious if f is real valued. We shall reduce the general 
case to this one by taking suitable inner products. Note first that for any 
w € R*, w ~ 0, there is one and only one v € R*, namely v = w/|w}, 
such that |v| = 1 and (v, w) = |w]. 

To prove the theorem, take w = ff du andv = w/|w|, as above. (If 
w = 0, the theorem is obvious, so we can suppose that this is not the case.) 
We have 


[Sf dul = (v, Sf du) = Je, f) du < ffl du 


by Cauchy—Schwarz and the fact that |v] = 1. Now, if equality holds, 
then since |(v, f(x))| < |f(x)|, we must have (, f(x)) = |f(x)| a.e.; then by 
the remark at the beginning we must have f(x) = 0 or elsev = f(x)/|f(x)|. 
In either case f(x) = |f(x)|. 


Let f: X > R* be integrable and satisfy |f(x)| < Ma.e. If \{f dul = 
Mu(X) < ©, then f is constant a.e. 


We have [|f| du < Mu(X) = |ff dul. The first inequality shows that 
| f(x)| = M a.e., and Theorem 8.8 shows that f(x) = |f(x)|v ae. 


PRODUCT MEASURES 


In dimension one the practical way to calculate an integral is to find a primitive. 
In higher dimensions it 1s to reduce the integral to a succession of one-dimensional 
ones. ‘The theorem that tells how to do this is not only important in calculations 
but plays a fundamental theoretical role as well. It is simple to state and to 
use, but the proof is long and technical. Unfortunately, we shall omit it. 

If X and ¥ are sets, then X X Y is the set of pairs (x, y) with x © X and 
y GY. If X = R” and Y = R’, the pair (x, y) can be considered as a point 
in R™*"—the point whose first m coordinates are the coordinates of'x and whose 
last n are the coordinates of y. Thus, R™ X R” = R™'*. The problem is this: 
Given measures » and v on X and Y, construct a natural measure uw X v on 


pee 


13/integration 


XX Y. It can be solved if the measures » and » are o-finite in the following 


sense: 
DEFINITION An outer measure p on a set X is o-finite if X is the union of a sequence of 
9.1 measurable sets of finite measure. 
The definition of the measure » X » is similar to that of the Lebesgue 
measure. 
DEFINITION Let wand v be o-finite outer measures on X and Y. For any sel x x YT; 
9.2 define 
(u X v)(Z) = inf 2u(Ax) (By), 
where the inf is taken over all sequences {A,} and {By} such that E C 
The first basic theorem is the following: 
THEOREM If 4 and v are o-finite outer measures on X and Y, then wp X v ts a o-finite 
9.3 outer measure on X X Y. If A is a w-measurable set in X, and Bisa 
v-measurable set in Y, then A X B is a (u X v)-measurable set, and 
(u X ¥)(A X B) = w(A)r(B). (1) 
If wand v are the Lebesgue measures on R™ and R®, then wp X vis the Lebesgue 
measure on R™*”, 
For the second basic theorem we need a little notation. Iff:X XK Y— Z, 
and x is a point of X, then f.: Y — Z is the function defined by 
foly) = f(x, 9). 
fz is called the section of f through x. 
THEOREM (Fubini’s Theorem) Let p and v be o-finite, and let f: X X Y — R' be 
9.4 nonnegative and (u X v)-measurable. Then 


(a) For almost every x © X, the section fz 1s v-measurable. 
(b) The function F(x) = fe dv is u-measurable, and 


ff d(u Xv) = {F du. 
(c) The same statements hold with X and Y interchanged. 
Exercise 1 Take X to be the positive integers and » to be the counting measure. Show 


that the notion of a measurable function from X¥ X Y to R! is equivalent to the 
notion of a sequence of measurable functions from Y to R’. Use Fubini’s 


Exercise 2 


THEOREM 
9.5 


product measures 323 


theorem to deduce that a series of nonnegative measurable functions can be 
integrated term by term. Use this to deduce the monotone convergence 
theorem (in the o-finite case). This very special case gives some illustration of 
the power of the theorem. 


Ordinarily, parts (b) and (c) of Fubini’s theorem are written in the follow- 
ing way: 
Jf du x ») = {ffl y) deQ)} dul) 
= StF, ») du(e)} dv). (2) 


In fact, du(x) and dy(y) are often written simply dx and dy, in which case the 
formula becomes 


fa x ») = St ff, y) a} de = SSA, 9) &} ae. (3) 


Of course, the theorem applies also to measurable functions that are not 
necessarily nonnegative. 


Prove the following theorem by splitting each coordinate function of f into its 
positive and negative parts. 


Let yp and v be a-finite, and let f:X X Y— R® be (u X v)-measurable. 
If one of the three integrals below is finite, then formula (2) holds: 


Slfldu x», S{SifG, | deQ)} du(x), 
SSF, | du(x)} doy). 


One frequent application of Fubini’s theorem is to the case when f is the 
characteristic function of a set Ein X X Y. In this case the section f, is the 
characteristic function of the set 


(Te Ny Se OC) 8, 


which is called the section of E through x. Fubini’s theorem shows that if E is 
(« X v)-measurable, then LE, is y-measurable for almost every x, and 


(u X »)(E) = Jv(Ee) du(x). (4) 


In particular, E has (u X v)-measure 0 if and only if almost all sections have 
v-emeasure 0. This is one of the most common ways to show that a set in R” 
has measure 0. 

Quite often the key problem in the applications of Fubini’s theorem is to 
prove that the set or function involved is (u X v)-measurable. To prove that 
a set EC X X Y has measure 0, for example, it is not enough to prove that 
almost all sections have measure 0. It must be proved also that the set 1s 
(u X v)-measurable. This is something of an anomaly, because all the sets and 
functions that actually do turn up in practice are measurable! 


oe: 


13/integration 


Exercise 3 


Let X = R" and yu be the Lebesgue measure, and let Y = R! and » be the 
Lebesgue measure. A function f:X — R! is y-measurable if and only if its 
graph is (u« X v)-measurable. Notice that the sections of the graph are just 
single points and, consequently, have »v-measure 0. Hence, the function is 
u-measurable if and only if the graph has (u X v)-measure 0. 


As a first example of the use of Fubini’s theorem, let us show that the volume 
of a cone in R* is 1/n times the area of the base times the height. The first task 
is to put straight what the result says. 

Let B’ be a measurable set in R*! and let h > 0. The set B’ is moved up 
to the plane x, = / as follows (see Figure 4): 


B= {x C R*:x’ € B’ and x, = h}, 


where, as usual, x’ denotes the first n coordinates of x. Then the cone with 
vertex 0, base B’, and height # is the set 


Gann CG Band 0) =< a= 1). 


The result is that 
1 
bin(C) = = hatna(B), (5) 


where, of course, #n and pn_1 are the Lebesgue measures in R” and R™"!. Note 
that the cone is the union of the line segments joining points of B to 0. 

The section C; through a point ¢ © R! is obtained as follows: The point 
(y’, t) belongs to Cif and only if (y’, 2) = a(x’, 4), where x’ € B’and0 <a <1, 


Figure 4 


Exercise 4 


product measures 325 


and, therefore, if and only if 
t 
Pe x CB and 0272 % (6) 


Consequently, C; = T(B’), where T:R*™! — R*! is the linear transformation 
given by (6). It follows from Exercise 13 of Section 2 that 


t n—-1 
Bn—-l (C,) = (;) ferges (02s). 0 < t < h, 


and hence from Fubini that 
z 1 
atG) = i, Bn—1(C}) dt = A hpn-1 (B)). 
0 


Strictly speaking, the calculations are not justified until it is shown that the 
cone Cis measurable. If the base B’ is closed, then the cone is closed. If B’ is 
open, then C — (B’ U {0}) is open, so there is no problem in these cases. The 
general case can be handled by these two statements and formula (5) itself. 


Do this. 


Next let us calculate the volume (i.e., measure) of the unit ball B* = B(Q; 1) 
in R*. If we write R* = R™! X R}, then the section Bz, through a point 
x’ © R™1 is the interval 


— V1 — |x'2P cam < V1 = |x!/?. 


Therefore, 


ee eu pues Ox 0 = oes 2V1 — [x’? dx’, 


_JI=FF 


which suggests that for the purpose of induction it will be advantageous to start 
with the integral 


Hmm) = fy, (0 — Ie tae = foo, [reg = bP = 2 dee de 


If we write a = V1 — |x’|?, and ¢ = x,, the inner integral is 


a a/2 
i, (a? — i)? dt = amt ie cos™+! @ dé. 
Therefore, 
n/2 
I(m,n) = I(m + 1,n — 1) ee cos™t! @ d@. 


Using this formula to evaluate J(m + 1,n — 1), then I(m + 2,n — 2), and so 


326 


13/integration 


THEOREM 
9.6 


Proof 


Exercise 5 


on, we get 
n/2 a/2 
= mt+1 6 5 mtn . 
I(m, n) ee cos™t! @ dg es cost" @ dQ; 
then, taking m = 0, we get 
a = ue eo 9 =? n 
|B"|, = io. cos 6 d6 ie cos" 6 dé. (7) 


The integrals of the powers of the cosine are evaluated in Section 4 of Chapter 4. 
(But they are evaluated in a better way below.) 


If £:R? — R? ts nonnegative and measurable, then 
ice) dx dy = i ihe f(r cos 6,7 sin 6)r dr dé. 


The theorem says that if we put x = r cos and y = r sin 6 in a double integral, 
then we should put dx dy = 1 d@dr = 1dr d6. ‘The theorem is useful when the 
function f has some radial symmetry and in various other ways. The pair 
(r, 6) is called the polar coordinates of the point (x, y). 


First we shall integrate over the right half-plane, and we shall use Fubini 
to integrate first with respect to y, then with respect to x. In the first 
integral x is constant, and we make the change of variable y = x tan @. 
This gives 


i jae f(x, 9) dy dx = ee f(x, x tan 6)x sec? 6 dO dx. 


Now use Fubini to integrate first with respect to x and then with respect 
to@. In the first integral, where @ is constant, make the change of variable 
x =r cos @ to get the theorem. 


What about the measurability required to use Fubini the second time? 


Now we shall use Theorem 9.6 to get a nice formula relating the gamma 
function to integrals of powers of sines and cosines. Recall that 


Die) = in mes at, (8) 
We have already established the basic formula 
Tix +1) =«I(x), Td) =1. (9) 


If we*put ¢ = s? in formula (8), we get 


ee) = 2 i ° Beles? ds, (10) 


Exercise 6 


Exercise 7 


Exercise 8 


product measures 927 


Now calculate the product ['(x)I'(y) by using first Fubini and then polar coordi- 
nates. The result is 


2 i, 521g? ds 2 iS 2 Ig~t" dy 
= MD ac Ce: 


a/2 6 5 P 
=4 ik i (alos Osi) Oar ae 


I(x) P'(y) 


w/2 : 
ANS =) I Cos= 6 cin 4s) 0 20. 
Thus, we have the following nice formula: 


P@PG) 


a/2 
=) COs’ = uisin-’- 0 a0. ay 
BCs) I 


In particular, if we take 2x — 1 = m and 2y — 1 = 0, we get 


ie news G) 


a 
ae cos E ri j (12) 
e 2 


Notice that formula (11) also shows (2x — 1 = 0, 2y — 1 = 0) that 


TQ) = Wr. (13) 


Use formulas (7), (12), and (13) to show that 


arn! 2 


Be ar, cea (14) 


(a) 


There are some important cxamples of Fubini-like theorems where the measures 
uw and y are noto-finite, but in general this hypothesis cannot be avoided. Make 
an example in which Y = R! with Lcbesgue measure and Y = R' with counting 
measure, and f is the characteristic function of a suitable set such as is shown in 
Figure 5. 


Write this out using formula (9). 


The Hausdorff measure pm is not o-finite on R" ifm <n. (Hint: Exercise 12 of 
Section 3.) 


328 13/ integration 


10 


THEOREM 
10.1 


Proof 


Exercise 1 


Figure 5 


FUNCTIONS DEFINED BY INTEGRALS 
We shall consider two kinds of functions—the indefinite integrals 
Boe 0 * f(t) dt 
and the functions of the form 
F(t) = Jf, ) du(x). 


The results here are by no means the best possible ones, but they are good 
enough to have interesting applications. 


Let f:R! — Re” be integrable on each compact subinterval of an open interval I, 
and let 


ae iE f() dt. 


Then F is continuous at each point of I and satisfies F’(s) = f(s) at each 
point s where f 1s continuous. 


Let x, be the characteristic function of the interval [a, s]. If s, > 6, then 
Xs, (£) > xe(¢) at every point #, except perhaps t = b, so the dominated 
convergence theorem gives 


F(si) = Sfxa, dt— Sfx» dt = F(b). 


(Strictly speaking, this presupposes that b > a. What if 6 < a?) 


Prove the statement on differentiability by the same proof that was used for the 
Riemann integral. 


Remark 


Exercise 2 


THEOREM 
10.2 


Proof 


Exercise 3 


Exercise 4 


functions defined by integrals 329 


The function f in Theorem 10.1 may not be continuous at any point, in which 
case the statement on differentiability becomes vacuous. It is one of the funda- 
mental theorems in integration theory that F’(s) = f(s) a.e., whether f is con- 
tinuous at any point or not. This is proved in Chapter 14. 


Let f, g:R!— R" be integrable on each compact subinterval of an open interval 


Rit 
[ram foea 


for every two points a and 6 of J, then f = g a.e. on I. (Hint: The proof is 
tricky. Thedifference h = f — g has integral 0 over every compact subinterval, 
and the proof depends on the fact that every open subset of J is the union of a 
disjoint sequence of subintervals (which are not compact).] 


Now consider functions of the form 


F() = Sf(x, ) du(a), 


where p is a measure on a set X, and f:X K R'— R”. 


Let I be an interval, and let f: X K I— R® satisfy the following conditions: 
(a) For each t EI, the section f(x, t) ts measurable and satisfies 
| f(x, )| < g(x) a.e., where g:X — R115 a fixed integrable function. 
(b) For almost every x, the section f(x, t) is continuous on I. Then 
F(t) = Jf(x, t) du(x) is continuous on I. 


By condition (a) the function F is defined everywhere on J. Let N bea 
set of measure 0 in X such that if x E N, then the section f(t) = f(x, é) is 
continuous on J. If x EZ N, and t— a, then f(x, t&) > f(x, a). There- 
fore, f(x, t») > f(x, a) a.e. on X, and the dominated convergence theorem 
gives 


F(t) = Sflx, &) du(x) > Jf(, a) du(x) = F(a). 


The gamma function 


r() = a Komler® dx 


is continuous on 0 <t< ©. (Hint: In applying Theorem 10.2 you will want 
to take J to be an arbitrary compact interval [a, b] with a > 0.) 


Discuss 


x 


Oo tt a 
Fi) = if seen 
0 


330 


13/integration 
The next question is the differentiation of 


P= | F050 dul 


Py = fT ante, 


It is to be hoped that 


that is, that the derivative is obtained simply by differentiating under the integral. 
If this is true (which it is under suitable conditions), then for the gamma function 
it gives 


F(t) = ho * xl log xe~* dx, (1) 


and for the function F in Exercise 4 it gives 


P(e ie Posie de (2) 
The theorem is as follows. 
THEOREM Let pw be a o-finite measure on X, let I be an open interval in R}, and let 
10.3 Ve eae AN Assume 


(a) For almost every t € I, the section f(x, t) is measurable on X, 
and for some ¢ it is integrable. 

(b) For almost every x € X, the section f(x, t) ts C! on I. 

(c) There is an integrable g: X — R' such that 


f(x, t 
| ie ) S g(x) for all t © Tand almost all x € X. 


Then the function 


F(t) = f fx, #) du(s) 


is of class C on I and satisfies 
Of (x, ¢ 
F(t) = [Se ) d(x). 


Exercise 5 Verify formulas (1) and (2) by checking the conditions in this theorem. 


The first step in the proof of the theorem is a lemma to show that the 
combination of measurability in one variable and continuity in the other is 
stronger than it looks. 


Proof of the Theorem 


functions defined by integrals Bar 


Suppose that g: X X I— R” is continuous on I for almost every x © X 
and is measurable on X for almost every t G I. Then g is measurable on 
X for every t € I, and g is measurable on X XK I. 


Let N be a set of measure 0 in X such that if x E N, then g(x, 2) is con- 
tinuous on J. Let tp € J be given, and choose a sequence ¢, — tp such 
that g(x, &) is measurable. If x G N, then g(x, t.) > g(x, to), so g(x, to) 
is measurable. 

To prove that gis measurable on X X J, divide Jinto n equal intervals 
Iz, choose a point é in J, and set 


ep — ES Ht ES 


(To avoid overlap we can take the J? closed on the left and open on the 
right.) Itis plain that each g, is measurable on XY X J and that gn(x, f) = 
g(x, t) ifx @ N. Therefore, g is measurable on X X J. 


First we show that d0f/dt is measurable on X X J. Let N be a subset of 
X of measure 0 such that if x & N, then f(x, t)isC! on. Let A, — 0 and 
set 


ex (x, t) = f(x, ain hy) ee fx, t) 


hy 


It is easy to see that g, is measurable on X X Jand that gi(x, t) > Of(x, t)/dt 
ifx ZN. Thus, 0f/dt is measurable on X X I. 

According to the lemma and condition (c) in the theorem, 0//dt 
satisfies the hypotheses of Theorem 10.2. Hence, the function 


GC = J PD ay) (3) 


is continuous on J. For any points a, s € J, Fubini’s theorem gives 


[ G(t) dt = i | {Sar} du(x). (4) 


On the other hand, condition (b) in the theorem shows that for almost 
every x we have 


on i on (5) 


2 of 


It follows that f(x, 5) is integrable over X for every s, for we can choose a 
so that f(x, a) is integrable, while the integral on the right certainly is by 


Boe 13/integration 


Example 


Exercise 6 


Fubini and condition (c). Substitution of (5) into (4) gives 
[Pcoa= [69 -—fe al u@ =FO-FO. — (6) 
Since G is continuous, Theorem 10.1 shows that F’(s) = G(s). 


Theorem 10.3 is of theoretical importance, but it also leads to interesting 
calculations. 


Calculate 


© tr aj 
Fi) = i ae 
0 


x 


According to the theorem, we have 


F(t) = —- ie es" sini ede, i 0: 
Integration by parts twice (differentiating e—” and integrating the other factor) 
shows that F’(t) = —fF’(t) — 1 and, consequently, that 
F(t) = = 0 
1 vm 2 i 


Therefore, since F is C',. 


b 
dt 
F(b) — F(s) = — i, lie = arctan s — arctan b, 5,5 > 0. (7) 


Show that lim F(6) = 0. (Dominated convergence theorem!) 


b> 0 


Letting 6 > © in formula (7), we get 
7 
BG = 3 7 arctan s, for s > 0. 


In other words, 


il ee arctan 5 for s > 0. (8) 
0 x 2 


If we could put s = 0 in this formula, we would obtain the formula 


i sin x _@ (9) 
0 x yD, 


Exercise 7 


11 


convolution 333 


But we cannot. Nevertheless, formula (9) is correct, and the way to establish 
it is to start from the beginning with the function 


P 5—lf 6) 
F.() = i eat 
0 


x 

The same calculations lead to 

1 —e-*cosr — te" sinr 
1+ # 


This time 0 does not have to be excluded, and we get 


Fi(t) = — 


64 — e- cosr — te sinr 
i+? 


Letting 6 + © (dominated convergence theorem again!), we get 


F.(0) = i. 1 — ¢"cosr — te" sinr ih 
0 ap 


By the definition of the improper integral and once more the dominated con- 
vergence theorem, we have 


* si =) dt 
2S fh = iim BO) = i =.2, 
0 x To 0 1 + t? 2 


This is a good example, and a typical one, of how difficult theorems from 
the general theory are needed to do apparently simple explicit calculations. 
It is also a typical example of how even the powerful general theorems do not 
usually fill the bill exactly but have to be twisted around to fit the particular 
problem. 


F(0) — Fa) = = f 


What does Theorem 10.3 say when X is the positive integers and yu is the 
counting measure? 


CONVOLUTION 


The convolution of two functions f, g:R"— R! is the function f * g defined by 


fixe) = [fe —y)g0) & (1) 


at any point x where the integrand f(x — y)g(y) is integrable. The integral is 
of course the Lebesgue integral. The convolution has many important proper- 
ties, some of which will be proved in this section. Before starting, it is necessary 
to obtain a couple of very simple formulas for changes of variable. 


Gee 


13/1ntegration 


THEOREM 
11.1 


Proof 


THEOREM 
eZ 


Proof 


If f:R"— Rl! is Lebesgue integrable, then 

(a) For every point aC R", f(x +.) ts Lebesgue integrable, and 
ie: Sr) eee — Sf(x) ile. 

(b) The function f(—x) is Lebesgue integrable, and {f(—x) dx = 
f(x) dx. 
(c) For each p > 0, the function f(x/p) is Lebesgue integrable, and 
Sf(x/p) dx = prff(x) de. 


The procedure in proving such formulas is always the same. The first 
step is to prove the formula when f is the characteristic function of a 
measurable set. Then the rest follows automatically: If f is a simple 
function, then it is a linear combination of characteristic functions and 
the formula follows by linearity. Iffis nonnegative and measurable, then 
it is an increasing limit of simple functions and the formula follows from 
the monotone convergence theorem. Finally, if f is integrable, it is a 
difference of nonnegative integrable functions. Therefore, what we have 
to do is to prove the formula in each case when f is the characteristic 
function of a measurable set. 

Case a. If f is the characteristic function of the set E, then f(x + a) 
is the characteristic function of the set E — a, and it is obvious that the 
measure of E — a is equal to the measure of E. 

Case 6. f(—x) is the characteristic function of the set — £, and it is 
obvious that the measure of —£ is equal to the measure of EL. 

Case c. Let Tx = px. Then f(x/p) is the characteristic function of 
T(E), and according to Exercise 13 of Section 2, the measure of 7(£) is 
p” times the measure of EF. 


Some of the main propertics of the convolution are as follows. 


Let f, g, ht: R" — R! be integrable. Then 
(a) f * g is defined almost everywhere and is integrable. 
CS ei) 
(c) Gg) th =f xg eh) a.e. 


Everything depends on Fubini’s theorcm, so before starting we have to 
show that f(x — y)g(y) is measurable on R" X R*. It is clear that g(y) 
is, so all we have to look at is f(x — y). 

What we have to show is that if G is an open set in R! and F(x, y) = 
“f(x — y), then F-'(G) is a measurable set in R’*. Let U:R®"— R* be 
the linear transformation defined by 


U(x, y) a (x iy os ale). 


convolution 335 


Then F(x, y) € G if and only if x — y € f-(G), and this is true if and 
only if U(x, y) € f-(G) X R*._ In other words, 


F-\(G) = U-(f-(G) X R®). 


Now, f~1(G) is measurable in R*, because f is measurable. Hence, 
f-1(G) X R* is measurable in R®* by Theorem 9.3. Finally, F~'(G) is 
measurable by Theorem 4.5 (with ¢ = U™'). 

To prove part (a), use Fubini’s theorem and Theorem 11.1 as follows: 


SlAl « lglG) ax = fSIF& — vi leQ)| o} ax 
= fle (flFG& — y)| a} dx = fl fOd| dx flg(x)| dx. (2) 


Since f and g are integrable, the right side is finite, which shows that 
f(x — y)g(y) is integrable for almost all x, that is, that f * g is defined a.e. 
By Fubini’s theorem, f * g is measurable, and since | f * g(x)| < [fl * lg|@) 
and the latter is integrable by (2), it follows that f * g is integrable. 


Remark If we define 


fll = Sl flee, (3) 


then formula (2) gives 


If *all < Ilfllligil- (4) 


The number || f|| plays the role of an absolute value on the space of integrable 
functions. 


Exercise 1 For which f is || f|| = 0? 


To prove part (b) of the theorem we use parts (a) and (b) of Theorem 11.1 
to write 


f «g(x) = Sf — yg) &y = Jig — 2) dz = g f(x). 
To prove part (c) write 
fege —2) =2* fle — 2) = fe — z—y)fO) 
thereforc, 


(f xg) *h(x) = ff x g(x — Zale) dz = Sf ge — z — y)fQ)Ale) dy dz 
= fg «h(x — yf) & = (g *h) * f(x) =f * (g * A) (x). 


Exercise 2 Justify this calculation by Fubini. 


THEOREM Let f be integrable on each compact set and let g be of class C* and vanish 


11.3 outside a compact set. Then f * g ts of class C! and Df * g) = f * Dyg. 


336 


13/integration 


Proof 


DEFINITION 
11.4 


THEOREM 
11.5 


Exercise 3 


12 


It is simply a question of differentiating 
f * g(x) = fe — 9) dy 
under the integral sign, which is immediately possible by Theorem 10.3. 


A function f:R" — R! is of class Ct if it is of class C™ and vanishes outside 
a compact set. 


If f ts integrable on each compact set and g is of class Cy, then f * g is of 
class C™ and 


DM ee 


for any derivative D* of order k < m. 


Prove the theorem. (It is an immediate consequence of the previous one, of 
course.) 


APPROXIMATION THEOREMS 


Some very nice approximation theorems are possible by convolution. The idea 
is that if g looks as shown in Figure 6, that is, is very small outside a neighborhood 
of 0 but peaks sharply at 0 so that its integral is 1, then f * gis close tof in various 
senses, depending on what kind of function f is. The process of approximating 
the function f in this way is called mollifying or regularizing the function. 


Figure 6 


THEOREM 
12.1 


Proof 


LEMMA 
12.2 


Proof 


Exercise 1 


approximation theorems Bey) 


To make the process systematic, we shall choose a fixed nonnegative func- 
tion e in Co with integral 1, and set 


x 
e,(x) = pe (?) po 0. 
p 


According to Theorem 11.1(c), the integral of e, is 1. Moreover, if ¢ vanishes 
outside the ball B(0;r), then e, vanishes outside the ball B(0; pr). Therefore, 
as p—> 0, the functions e, look more or less like Figure 6, with sharper and taller 
peaks. As a matter of convenience we shall suppose that ¢ vanishes outside 
B(0; 1), so ep vanishes cutside B(0; p). This is really immaterial, but it does 
simplify occasional formulas. 


Let p— 0. Then 
(a) If f is uniformly continuous, then f * ¢,—> f uniformly. 
(b) If f is continuous, then f * ¢,—> f uniformly on each compact set. 


Since the integral of e, is 1, we have 
fx) — fee) = SU@ — fO)le@ — ») &- (1) 


Given e > 0, choose 6 > 0 so that if |x — y| < 6, then |f(x) — fQ)| <e. 
If p < 6, then e,(x — y) = 0 unless |x — y| < p < 4; so (1) gives 


| f(x) — f *ep(x)| < feep(x — y) dy = efep(x) dx = «, 


which proves part (a). 

We shall prove (b) by showing that f * e,—> f uniformly on each ball 
B(0;1r). Choose g continuous and equal to 1 on B(0;r + 1) and equal 
to 0 outside B(O;7 + 2), and set g = yf. Part (a) applies to g, which is 
certainly uniformly continuous. Moreover, g = f on B(O;7r + 1). 
Therefore, the result follows from the following lemma, which says that 
ge, =f * ep on B(0; 7) if p < 1. 


Let E be any set and let Ey be the set of points at distance <6 from E. If 
g=fon Eyandp< 8, then g * e, =f * ep on E. 


If h = g—f, then A(y)e,(x — y) is identically 0 when x € E. Indeed, 
ep(« — y) = 0 unless |x — y| < p < 6—and if this is the case, theny € Es, 
so h(y) = 0. 


If E is measurable and p < 4, then x = Xz, *é@p is 1 on E£, 0 outside Es, and 
between 0 and 1 everywhere. 


338 13/integration 


Exercise 2 


Exercise 3 


THEOREM 
12.3 


Proof 


If F is compact and G is open and G > F, then there is a C® function x that is 1 
on Ff, 0 outside G, and between 0 and 1 everywhere. 


State and prove an analog of Theorem 12.1 with continuity replaced by class C*. 


If f ts integrable, then ||f —f * e,|| > 0, where ||fl| = Styl dx. 


According to formula (4) of Section 11, we have 


lf +e. —2 * ell < If — all llell = \f — gll. (2) 


Therefore, in order to prove the theorem for a given f, it is enough to show 
that for each e > 0 there is a g such that the theorem holds for g and such 
that ||f — g|| <«. In particular, it is enough to prove the theorem when 
f is an integrable simple function. And then by linearity it is enough to 
prove the theorem when f is the characteristic function of a measurable 
set E of finite measure. Given e > 0, we can choose a compact F C E 
so that ||xz — xr|| = n(E — F) < ¢; so, in fact, it is enough to prove the 
theorem when f is the characteristic function of a compact set F. And 
this we proceed to do. 

Given e > 0, choose an open G > F so that n(G — F) < ¢, and then 
choose 6 > 0 so that #2; C G. According to Exercise 1, 


lxr5 *@>p — Xrl| < u(G— F) <e. 
According to formula (2), 


llcrs * 2 — xr *e,|| < lxr5 — Xrll < u(G —F) <e. 
The two together give 
xe *¢, — xF|| < 2c i a 0, 


which completes the proof. 


Theorems 12.1 and 12.3 and Exercise 2 indicate the value of approximation 
by convolution. It gives the best possible approximation within the class of 
functions considered. That is, if f is uniformly continuous, then the approxima- 
tion is uniform. If f is uniformly continuous along with all derivatives of orders 
<m, then the approximation is uniform along with all derivatives of orders <m. 
If f is integrable, the approximation is in the natural sense of Theorem 12.3. 
This convolution approximation has the same character with respect to almost 
all of the important classes of functions (of which there are many that we shall 
not have the time to introduce). 


Exercise 4 


16: 


Exercise 1 


DEFINITION 
13.1 


DEFINITION 
13.2 


multiple series 339 


Go back to Section 8 of Chapter 6 and redo the proof of the Weierstrass approxi- 
mation theorem in the light of the present ideas on approximation by convolu- 
tion. (But do it in R*, of course. The result is known in R” by virtue of the 
Stone—-Weierstrass theorem.) What can you say when f is of class C’ instead of 
just continuous? 


MULTIPLE SERIES 


There are many ways to sum a double series 
Ajk- 
j,k=0 
One is to sum first on j, holding & fixed, and then tosum on k. Another is to 
sum first on k, holding j fixed, and then tosum onj. A third is to sum over all 
j and k with j + & <1, and then letr— «. A fourth is to sum over all 7 
and & with j <r and k <7, and then letr— o. 


Give an example where the four methods lead to different results. 


The example shows that there is no reasonable theory of convergence of 
such series—without some additional restriction. There is a very reasonable 
theory of absolute convergence. In fact, it is already included in our theory of 


integration. 
Let N denote the set of nonnegative integers, and let N? be the set of 
p-tuples of nonnegative integers. Ifk = (ki, , kp) © N®, let |k| = ki + 


- ++ +4, The theory of absolutely convergent series of dimension is 
exactly the theory of integration on N? with respect to the counting measure v. 
Note that if » is the counting measure on N, then y=pXuX-+** Kp 
(p factors), so Fubini will be applicable. Recall that all sets and functions are 
measurable with respect to a counting measure. 


A multiple sequence of dimension p in R” ts a function a:N? > R". 


As in the case of ordinary sequences we sometimes write a; Or ai, * * * ky 
for a(k) = a(fa, . . . , ky), and {az} or {az, - * + x} for the sequence itself, 
i.e., for a. 


The series associated with a multiple sequence a:N? > R” 1s absolutely 
convergent if the function a is v integrable, v being the counting measure on 
N?. If this is the case, we call the integral the sum of the series and write 


Dan— ay, °° * k, = fa dv. 


Jao 


13/integration 


Exercise 2 


Let us see what Fubini has to say about the four ways to sum a double 
series suggested at the beginning. (The fact that = 2 simplifies the notation, 
but the arguments are general.) In the first and second methods what we have 
are the repeated integrals 


S{SaG, du(j)} duck) = and ~—f { fa(j, &) du(&)} du(j), 


u being the counting measure on N. Since vy = yw X yu, Fubini says that if a is 
v integrable, then these two integrals are equal to the sum of the series, that is, 
to fa dv. It also says, and this is very important, that if either of the above 
integrals is finite when a is replaced by |a|, then a must be » integrable. 

The other two methods can be described like this: There is an increasing 
sequence {£,} of subsets of N? with union N?, and the sum envisioned is the 


number 
lim i a dy. 
T— 00 Er 


In the third method £, consists of the (j, k) with j + & <1, and in the fourth it 
consists of the (j,k) with 7 <randk <r. 


If E, 7 N®, then a is v-integrable if and only if lim,_,, fz,la| dv < o; if this 


is the case, then 
[ow = lim Ie adv 
Tce Bie 


(Hint: Use Fatou’s lemma and the dominated convergence theorem.) 


The outcome of this discussion is that all four methods (and indeed any 
others you can think of!) lead to the same result if a is y integrable, and that it 
can be decided whether a is v-integrable by the methods themselves simply by 
replacing a by |a|. 

One important example of a multiple series is a multiple power series, 
which is a series of the form 


JO) 2a, cenay ag (1) 
If we write 
es ee forx ER" and kEN’, 
the series becomes 
f(x) = Zapx* (2) 


so that it looks just like an ordinary power series. The results also look like the 
results for ordinary power series. To state them it is helpful to introduce some 
notation. If r,p © R", wewriter > ptomeanthatr; > p;foreachj. Ifr > 0, 
we write D, for the rectangle 


D, = {x € R*:|x,;| <r, for each 7}. 


THEOREM 
183) 


Proof 


Exercise 3 


14 


regular values and Sard’s theorem 341 


If the series (2) converges absolutely at some point x = & with |t;| = r; ¥ 0 
for each j, then it converges absolutely and uniformly on D,, the function f 1s 
GC” on D,, and the series can be differentiated term by term; 1.2., 


) 

sei (2) = Uae =o 
Ox; 

with absolute convergence on D,. (As usual, e; is the vector with jth 

coordinate 1 and all the others 0.) 


It is obvious that the series converges absolutely and uniformly on D,, for 
if x © D,, then |agx*| < |asr*|, and the function on the right is integrable 
over N* by hypothesis. To prove the assertion on differentiability, fix 
any p <r and put fh; = p;/r;. If x € Dp, then 


laux*| < |axp| = lawr*|hY < MAt, 


the last inequality coming from the fact that each term of the convergent 
series D|a,r*| is at most the sum. Now, the series Dk;h*-*i converges. 
Indeed, the terms are nonnegative, so Fubini says that it can be summed 
first with respect to ‘i, then with respect to ke, and so on. The resulting 
series are all geometric, except for the sum with respect to k;, which is the 
derivative of a geometric series. Ifj = 1, for example, the result is that 


k-e, = ae Ses 5 
2 ng 2 Ce (yy = | Ge hn) 


Now Theorem 10.3 does the job if we take X = N*, J = (—p;, pj), and 
g(k) = Mk,ht-i, This can be done with each of the variables, so we are 
back in the initial position, but with regard to the derivatives of f and the 
differentiated series (on any smaller rectangle). Therefore, f is of class 
C” on D; and the series can be differentiated term by term at will with no 
loss of absolute convergence on D;. 


If f is given by (2) with absolute convergence on D,, r > 0, then the series is 
the Taylor series of f; that is, 


_ Di ++ + DEF) 
(ee | a 


ak 


REGULAR VALUES AND SARD’S THEOREM 


Let {:R*—> R” be of class C* on an open set G. A point y € f(G) is called a 
regular value of f if every point of the set 


M, = {x € G:f(z) = 9} 


34? 


13/integration 


THEOREM 
14.1 


THEOREM 
14.2 


LEMMA 
14.3 


Proof 


is a regular point of f. If y is a regular value, then of course M, is a smooth 
manifold of dimension n — m. 

Since it is required that every point of M, be a regular point of f, it might 
be expected that the regular values are pretty sparse, but A. Sard found a 
remarkable theorem. 


(Sard’s Theorem) If f is of sufficiently high class C*, then almost every 


value of f ts a regular value. 


A point x € G is called a critical point of f if it is not a regular point, and a 
point y € f(G) is called a critical value of f if it is the value of f at some critical 
point (i.e., if the set M, contains at least one critical point). In terms of critical 
points and critical values, Sard’s theorem is as follows: 


(Sard’s Theorem) If f is of sufficiently high class C’, then the set of critical 
values has measure 0. Equivalently, if K is the set of critical points, then f( K) 
has measure 0. 


Note that the theorem is already known (and not very interesting) when 
n < m, for Corollary 2.9 shows that the whole set f{(G) has measure 0. There- 
fore, we shall suppose from now onthatn > m. In this case the set K of critical 
points is the set where the differential df has rank <m; if we write K; for the set 
of points where df has rank j, then what we shall have to prove is that f(A;) has 
measure 0 forj7 < m. The first step is to handle f(Ko). 


If f ts of class C’ with r > n(n + 1)/2m, then f( Ko) has measure 0. 


The proof goes by induction on the dimension n. (m is fixed.) The 
induction is started by the remark above which shows that the lemma 
holds for n < m. 

Let A, be the set of points where the partial derivatives D;f vanish 
for 1 < || < k, and let p be the first integer >n/m. (Note that n/m < 
n(n + 1)/2m, so p <+r.) Since Ko = Ai, we have 


[Kol = |f4dl < Y [fae — Aved! + [44D |, 
k=1 


so it is enough to prove that each f(A, — Az+1) has measure 0 and that 
f(A,-1) has measure 0. The latter is Exercise 12 of Section 2, so we can 
concentrate on a fixed f(A, — Az+i). By Theorem 2.6, it is enough to 
show that each point a © A, — Axyi has a neighborhood Go such that 
FC = iO) Go)| = 0. Therefore, let a be a fixed point of Ay — Ar-+1. 


LEMMA 
14.4 


Proof 


regular values and Sard’s theorem 343 
By the definitions of A, and Ay: there exist indices 79 with lil = & 
and jy with 1 < jy < m such that VD,,f;,(@) # 0. Therefore, the set 
N = {x € G:Difi.(x) = 0} 


is a smooth manifold of dimension n — 1 in R* in a neighborhood of the 
point a, and it is of class C* with 


n(in+1) _ m _ nm — 1) 


=r—-—k>r—- —2)> 
i 4 — ( ) 2m m 2m 


Let ¢ be a local parametric representation of N at a which is of class 
C* on an open G’C _R*! and set g = fog. Let Kj be the set of points 
in G’ where dg has rank 0 (that is, dg = 0). Since dg(t) = df(y(@)) de®), 
itfollows that ife@) & Ko, then # 0h, that is, that y(K,) > Ron Vea 
and hence that 


(Ki) D f(KoO o(G’)) D flare O o(G)). 


Now, ¢(G’) contains a neighborhood of a in N, so there is a neighbor- 
hood Gy of ain R* such that 9(G’) D N(\ Go D Ak OO Go. Consequently, 


Ey) DiGye Ga). 


The induction hypothesis gives that | alk I = 0, and the lemma is proved. 
Now we turn to the other f(K;). 


If f is of class C’ withr > n(n + 1)/2, thenf( Kj) has measure 0 for} < m. 


We fix j and use bars to denote the first coordinates and primes to denote 
the rest (both in R* and in R™). First we prove the lemma when f has 
the special form 


f(x) = (%, A(x), — where A:R" > R™, (1) 


and then we show how to reduce the general case to this one. 
If x © R’, then hz is the section of A defined by 


PAC) = WC 
If A is a set, then A; is the section of A defined by 
Ap se) eS All, 
When f has the form (1) we have 


flA)z = hz(A3). (2) 


oe 


13/integration 


We apply this to A = K;. Since the Jacobi matrix of f is 


Ir O 
Ooh Oh} 
Ox dx’ 


itfollows that A = {x:0h/dx’ = 0}, andhence that Az = {x’:dhz(x’) = 0}. 
From formula (2) and Lemma 14.3 we get that 


|f(A)als = 0; 


from Fubini we get that |f(A)|, = 0. [The differentiability needed to 
use the first lemma is class C’? with r > (n — 7)(n — 7 + 1)/2(m — 7), and 
this is obvious for r > n(n + 1)/2.] 

Now we proceed to the general case. As usual it is enough to show 
that each point a € A; has a neighborhood in A; whose image has measure 
0, so let a be a fixed point in X;. The Jacobi matrix of f has 7 linearly 
independent rows. By relabeling the coordinates in R”™ we can assume 
that the first j are linearly independent. The Jacobi matrix of f has j 
linearly independent columns. By relabeling the coordinates in R* we 
can assume that the first 7 are linearly independent. With this choice 
of coordinates the Jacobi matrix 0f/0% is nonsingular at the point a, so by 
the implicit-function theorem we can solve the equation f(%, x’) = 7 for 
X in terms of (7, x’). The solution is a function g:R* — R’ that is of class 
Cr on a neighborhood of (f(a), a’) and maps any such neighborhood on a 
neighborhood of @. We define 


v9, x") = (09, x), x’) and = _g = foy. 
It is plain from the construction that g does have the form (1); so if 
we can show that g(K,) D f(K; C\ Go), K; being the set of points where dg 
has rank 7 and Gp being a neighborhood of a, then we shall be finished. 


The function y is regular at the point (f(a), a’). Indeed, its Jacobi 
matrix is 


ge yee 
oy dx’ p 
OF 


which is nonsingular because 0¢/07 is nonsingular. (Why is 0¢/d¥ non- 
singular?) Since dg(z) = df((z)) dp(z), it follows that the rank of dg at z 
is equal to the rank of df at ¥(z), hence that ¥(K;) D Kj CO Go, and finally 
that ¢(K;) D f(Kj.\ Go), Go being some neighborhood of a. 


Remark Lemma 14.4 establishes Sard’s theorem when f is of class C* withr > n(n + 1)/2. 


The theorem is actually true ifr > n — m+ 1 (and of courser > 1 ifn < m). 


Exercise I 


Exercise 2 


Example 


regular values and Sard’s theorem 345 


Arthur Sard 


With this minimum differentiability the outline of the proof remains the same; 
but there is one very sticky point, and the theorem is usually called “Hard Sard.” 


Let N be a smooth manifold of dimension and classC” withr > n(n + 1)/2, 
and let {:N— R™ be a function of class C’. We have seen in Section 4 of 
Chapter 11 that for any point a € N df(a) is a linear transformation from the 
tangent space T.(N) into R™. The point a is called a regular point if df(a) has 
the maximum possible rank (i.e., the smaller of n and m) and a critical point 
otherwise. Regular values and critical values are defined as before. In this 
context Sard’s theorem remains perfectly correct. Indeed, it is an immediate 
consequence of the original version and the following exercise. 


The point a € N is a regular point of f if and only if it is a regular point of 
fe = f° 9, where ¢ is a local parametric representation of N ata. 


Write out the proof of Sard’s theorem for functions on manifolds. Why assume 
that N C C’ withr > n(n + 1)/2? 


Let N be the torus obtained by revolving the circle x? + (y — 2)? = 1 around 
the x axis and let f(x,y, z) =z. The gradient of f is (0, 0, 1), so the critical 
points are those where this vector is orthogonal to the tangent plane (ce, the 
points where the tangent plane is horizontal). These are the four points 
(0,0, +3) and (0,0, +1). The critical values are the numbers +3 ang stale 
(If these statements are not apparent, it will be helpful to look back at Section 4 
of Chapter 11.) 


346 


13/integration 


Exercise 3 


It is interesting to look at the smooth manifolds 


M, = {(x, y, z):f(x, y, z) = a}. 


When a is one of the critical values, M. is not a smooth manifold of dimension 
2—1=1. When aisa regular value, M, is, of course, a smooth manifold of 
dimension 1. The interesting thing is that the character of M. remains the 
same as long as a does not pass through a critical value, and the character 
changes when @ does pass through a critical value (Figure 7). Whena = +3, 
M, is a single point—obviously not a smooth one-dimensional manifold. When 
a = +1, M,is essentially a pair of tangent circles. (Use the results of Section 3 
of Chapter 11 to show that this is not a smooth one-dimensional manifold.) 
When a is between 1 and 3 or between —1 and —3, Mz is essentially a circle; 
when a is between —1 and 1, M., is essentially a pair of disjoint circles. 

This phenomenon-—that the character of M, changes only when a@ passes 
through a critical value—is a general one, not an accidental feature of the 
example. It provides one of the powerful methods for studying compact 
manifolds. 


Let N be a smooth manifold of dimension n, and let {:N — R” be of class C’. 
If y € R”™ is a regular value of f, then the set 


M, = {x CN:f(x) =y} 


is a smooth manifold of dimension n — m. State and prove the result also when 
things are of class C’. 


——s 


: 2 


Figure 7 


Exercise 4 


Exercise 5 


Exercise 6 


Exercise 7 


regular values and Sard’s theorem 347 


Try to formulate a Sard’s theorem when f:N — M, where N and M are smooth 
manifolds of dimensions n and m of suitable class C’. The point here is that 
df(a) maps the tangent space T.(N) into the tangent space T,(M), 6 = f(a), 
which, of course, has to be proved. Now define regular value and define the 
notion of a set of m-dimensional measure 0 on M, and state and prove Sard’s 
theorem. (You can define the notion of measure 0 either by using local 
parametric representations or by using the Hausdorff measure wm. The best 
thing is to do it both ways and to show that the two notions are equivalent. 
This involves going back to Section 2 and redoing the results for Hausdorff 
measures.) 

Sard’s theorem and Exercise 7, Section 2 of Chapter 12, give an immediate 
proof of the functional-dependence theorem, which asserts that functions fy 
. . . , fm are functionally dependent on a compact set K if their gradients are 
linearly dependent at each point of K. The precise statement is as follows. 


Let fi, . . - , fm be real-valued functions of sufficiently high class C" on a 
neighborhood of a compact set K C R*. If the gradients are linearly dependent 
at each point of X, then there is a function F:R”— Rt? such that 

(a) Fis of class C? on R™ and does not vanish identically on any open set. 


(CHO. 6 6 He) = Worn SG 


State and prove the functional-dependence theorem when R” is replaced by a 
smooth manifold of class C’. 


Here is an interesting example. 


The functions f,(x, y) = x, fo(x,y) = xy, fa(x, y) = xye¥ are functionally depen- 
dent in the above sense. Nevertheless, even though these functions are analytic, 
there is no analytic relation between them on any neighborhood of 0. (A 
function is analytic if it has a power-series expansion in a neighborhood of each 
point.) 


348 


14 : Differentiation 


l 


DEFINITION 
1.1 


Exercise 1 


Example 1 
(Regular Borel 
Measures on R') 


Exercise 2 


REGULAR BOREL MEASURES 


The object of this chapter is to study some important measures on R* and their 
relation to the Lebesgue measure. 


A regular Borel measure on an open set 2 C R* ts an outer measure v on Q 
such that every open set 1s measurable, every point has a neighborhood of 
finite measure, and every set is contained in a Gy of the same measure. 


For any A C Q, v(A) = inf »(G) over the open GD A. A is v measurable if 
and only if A differs from a G; by a set of measure 0 and, also, if and only if A 
differs from a K, by a set of measure 0. 


The Lebesgue measure is the basic regular Borel measure. Unqualified 
terms such as measurable, integrable, and so on, will always refer to it. As 
usual, the Lebesgue measure of a set A will be written |A|, or | A], if it is necessary 
to display the dimension, and the Lebesgue integral will be written Sf dx. 


A regular Borel measure v on R! determines a function a@ as follows: 
a(x) = »((0,x]) if >0, a(x) = —v((x, 0]) ifx <0. 


The function a is nondecreasing, continuous on the right, and takes the value 
0 at 0. When is e continuous? 


Conversely, let a be any nondecreasing function and define the “length” 
of an open interval J = (a, b) by 


la(I) = a(b — 0) — a(a+ 0), 


Exercise 3 


Exercise 4 


Example 2 
(Indefinite Integrals) 


THEOREM 
1.2 


Exercise 5 


Exercise 6 


Exercise 7 


THEOREM 
1.3 


regular Borel measures 349 


where a(b — 0) and a(a + 0) denote the left- and right-hand limits of @ at 6 
anda. Then define a measure vz, as in the Lebesgue case, by 


va(A) = inf 2l,(Jx), 


where {J;,} is a sequence of open intervals covering A. It is not necessary to 
use open intervals, but some care must be taken to get things straight at points 
where a is not continuous. For instance, if J is the half-open interval (a, 5], put 
la(I) = a(6 + 0) — a(a + 0), then cover by half-open intervals, and so on. 


Show that va is a regular Borel measure and that v.(J) = Ja(J) when J is an 
open or half-open interval. What is it when J is half-open on the other end, 
or closed? 


Discuss the correspondence between increasing functions and regular Borel 
measures furnished by Exercises 2 and 3. 


As far as R! is concerned, this chapter is essentially the theory of increasing 
functions. 


The measure v of Theorem 1.2 is called the indefinite integral of f. As mentioned 
above, the unqualified terms measurable, integrable, and so on, refer to the 
Lebesgue measure. 


If f is nonnegative, measurable, and integrable over every compact set in Q 
then there is a unique regular Borel measure v on Q such that for every measurable 
stECQ 


+B) = i f dx. (1) 


Prove the theorem. ([Hint: When A is not measurable, define »(A) = inf v(E) 
over all measurable ED A. Use Theorem 7.6 of Chapter 13.] 


If a given regular Borel measure » is the indefinite integral of f and also the 
indefinite integral of g, then f = g a.e. on 2. 


If the regular Borel measure » is an indefinite integral, then every Lebesgue 
measurable set is » measurable, [Hint: If |E| = 0, then »(E) = 0.] 


When » is an indefinite integral, every integral with respect to »v can be 
expressed as a Lebesgue integral as follows: 


Let the regular Borel measure v be the indefinite integral of f. If g ts non- 
negative and v measurable, then gf is Lebesgue measurable, and 


Jg dv = fef dx. (2) 


258 


14/ differentiation 


Proof 


Remark 


DEFINITION 
eet 


To simplify a little, we shall suppose that g is defined a.e. with respect to v 
and shall leave the case where it is not to the reader. ‘This does not mean, 
however, that g is defined a.e. with respect to the Lebesgue measure. 
We shall have to use the convention that the product gf is 0 wherever f is 0, 
whether g is defined there or not. 

First suppose that g = xn, where v(N) = 0. By the definition of a 
regular Borel measure, there isa Gsset E ) Nwithy(Z) = 0. By formula 
(1), xef = 0 ae.; therefore, xvf = 0 a.e., so xwf is measurable. 

Next, suppose that g = x4, where A is vy measurable. ‘Then there is 
a Gs; set E > A such that v(# — A) = 0; hence 


xaf = xef — xa—af 


is measurable. Integration gives formula (2). 

If ¢ is simple with respect to », then ¢ is a linear combination of 
characteristic functions of vy measurable sets, so gf is measurable. Clearly 
(2) holds, since it holds for each of the characteristic functions. 

Now let {¢,} be an increasing sequence of simple functions with 
respect to v, which converge tog a.e. with respect tov. Then the sequence 
{yf} is an increasing sequence, and by what has been proved it converges 
to gf a.e., so gf is measurable. ‘The monotone convergence theorem gives 
formula (2). 


This is the pattern for the proofs of many formulas. First come characteristic 
functions of sets of measure 0, then characteristic functions of measurable sets, 
then simple functions, and finally increasing sequences. 


The basic problems are the differentiability of regular Borel measures and 


the relationship between the derivative and the indefinite integral. 


The regular Borel measure v ts differentiable at the point x if the limit 


Ey G2\ == Nhtiaai ANS) 
|n|>0 |B| 


exists, where B denotes an open ball containing x. Dv(x) ts called the 
derivative of v at x. 


Exercise 8 The definition is the same if the balls are required to be closed instead of open. 


(Hint: The point here is that the open and closed balls have the same Lebesgue 


measure, though not necessarily the same v measure.) 


Exercise 9 


If w is a nondecreasing function on R', and », is the corresponding measure, 


then Dyg(x) = a’ (x). 


THEOREM 
1.5 


regular Borel measures B51 
The basic theorem is as follows. 


Every regular Borel measure v ts differentiable almost everywhere. It is an 
indefinite integral if and only if v(N) = 0 whenever |N| = 0, and in this 
case it 1s the integral of its derivative. 


This theorem, which is quite hard, is established in Sections 2 and 3. 
A measure with the property that y(V) = 0 whenever |N| = O iscalled absolutely 
continuous. Let uslook at a couple of examples of measures that are not absolutely 
continuous. As a very simple example there is the counting type measure 
where (A) is the number of points in A with integer coordinates. In this case 
Dyv(x) = 0 if x does not have integer coordinates, and Dv(x) is undefined (or is 
+ 0, if you prefer) if x does have integer coordinates. 

A more interesting example is the Cantor function (named after Georg 
Cantor), which is a function @ defined on the interval Jo = [0, 1] by repeating 
the following construction: If J is a closed interval such that a is already defined 
at the end points but not in the interior, then on the closed middle third of J 
define a to have the constant value equal to the average of the values at the 
end points. Start with Jo = [0, 1] and with a(0) = 0 and a(1) = 1. Let 
be the open middle third and let J, be the rest. Define a on /; by the prescrip- 
tion above, which means a = 3. Now, Ji is the union of two closed intervals. 
Let J; be the union of the two open middle thirds, and on each of these closed 
middle thirds define a by the prescription above. (On the left-hand one it will 
be ¢ and on the right-hand one it will be #.) Let Jz = J; — Ip. In general 
J;, is the union of 2* closed intervals, each of length 1/3*, J. is the union of the 
2* open middle thirds, and then Jz41 = Ji, — igi. If @ is already defined on 
I,, then it is defined on each interval of J,,1 by the prescription above, that is, 
as the average of its values at the two end points of the corresponding interval 
in Jy. The first few stages look as shown in Figure 1. By this procedure a is 


Figure 1 


oe 


14/ differentiation 


Exercise 10 


Exercise 11 


defined on the open set 
G= Vi, 


(actually on the union of the I,) and is undefined on a subset of the compact set 
C=J,-—G=l rp. 


Since J; is the union of 2* intervals each of length 1/3*, its measure is 2*/3*, so 
the measure of C is 0. 


Show that the function a has a unique continuous extension to the whole 
interval (0, 1]. 


Show that a’(x) = 0 for each x € G; hence a’(x) = 0 a.e. in spite of the fact 
that a is continuous and nondecreasing, and actually increases from 0 to 1. 
In particular, @ is not the integral of its derivative. 


If v. is the corresponding measure, then clearly v.(Z) = 0 for each of the 
intervals J that make up J;. Hence va(J,) = 0; therefore, v.(G) = 0. On the 
other hand, v2({0,1]) = 1 because a(0) = 0 and a(1) = 1, so va(C) = 1. 
Since |C| = 0, this shows that v. is not absolutely continuous. In fact, this is 
the extreme opposite of absolute continuity. The interval [0, 1] splits into the 


Henri Poincaré 


“In the old days people invented new functions with some practical aim in 
mind. Nowdays they do it just to poke holes in the reasonings of their fathers— 
and they won’t ever get more from these functions than that.” In spite of such 
opinions the work of Cantor was fundamental. 


Example 3 
(Change of Variable) 


Exercise 12 


Exercise 13 


regular Borel measures 353 


Georg Cantor 


two disjoint parts G and C such that ».(C) = 1, |C] = 0, while ».(G) = 0, 
|G| = 1. It is possible to make similar examples where the function a@ is not 
just nondecreasing but strictly increasing. 

It was toward the end of the last century that such bizarre functions began 
appearing. They were viewed with alarm by many of the leading mathe- 
maticians of the day, one of the greatest of whom was Henri Poincaré. 


For the purpose of calculus the most important regular Borel measure is the one 
that arises from a change of variable y = g(x) in an integral [f(y) dy. Let 
¢g:R” — R” be continuous and one to one on an open set 2 and define 


Wen) = IA fies ie (3) 
If (A) is Lebesgue measurable, then A is y measurable. (Just use the definition 


of measurability.) This implies that every compact set in 2 is »y measurable and 
has finite measure. 


If E is a G; that contains g(A) and has the same Lebesgue measure, then ¢~!(E) 
is a G; that contains A and has the same v measure. 


The two exercises together show that v is a regular Borel measure on 0. 
Now we shall establish the formula 


Jc) 10) 4 = [fle ay & 


whenever f is Lebesgue measurable on g(Q). If f is the characteristic function 
of a Lebesgue measurable set B C ¢(Q), then f o ¢ is the characteristic function 


poe 


14/ differentiation 


THEOREM 
1.6 


Exercise 14 


of A = y~}(B), so formula (4) is just the definition of ». Knowing the formula 
in this case, we obtain it immediately for simple f by linearity, for nonnegative 
measurable f by the monotone convergence theorem, and for arbitrary measur- 
able f by linearity again. 

The basic theorem to be proved in this situation is the following. 


Tf e is differentiable at a point x, then 
Dv(x) = |det de(x)|. 


The proof is given in Section 4 for the case where ¢ is of class C1 at x and 
in Section 4 of Chapter 16 for the general case. 

Suppose, finaily, that ¢ is of class C! on . In this case Corollary 2.10 of 
Chapter 13 shows that v is absolutely continuous. (The Cantor function shows 
that it is not absolutely continuous in general!) Consequently, formula (4), 
along with Theorems 1.3, 1.5, and 1.6, gives the change-of-variable formula 


5d = J, fle@)ldet dyp(x)| dx. (5) 


As an illustration, consider the change from rectangular to polar coordinates, 
in which the function ¢ is given by x = rcos 6, y =rsin@, 0 <r < © and 
0 <6 < 2x. In this case v(Q) is the whole plane except for the nonnegative 
x axis, which has measure 0, and det dy(r, @) = r; so formula (5) gives 


i pad y) dx dy = i ne cos 6, r sin 6)r dé dr, (6) 


which is exactly the formula we have found already for changing from rectangu- 
lar to polar coordinates. 

In the one-dimensional case the function g must be either strictly increasing 
or strictly decreasing (since it is one to one), and det dp(x) = ge’ (x). How is the 
absolute value in formula (5) to be reconciled with the formula that we have 
known all along for changing variable in the one-dimensional case? 


In the definition of a regular Borel measure we have used the condition 

A. Every point has a neighborhood of finite measure. 

Later we shall want to consider regular Borel measures on arbitrary metric 
spaces, not just open sets in R”. In our present situation the condition A 1s 
equivalent to the condition 

B. Every compact set has finite measure, 
but in general it is not. 


differentiability theorems 355 


2 DIFFERENTIABILITY THEOREMS 


In proving the differentiability of regular Borel measures it is convenient to 
introduce upper and lower derivatives which always exist and provide some- 
thing to work with. 


DEFINITION Let v be a regular Borel measure on Q. For each € > 0 let 
2.1 
VB Q) y(B MQ) 
D‘y(x) = sup ————> Deo. int <> 
|B| |B| 


where the sup and inf are taken over all open balls that contain x and have 
radius <e. Let 
Dv(x) = lim D*r(x), Dyv({x) = lim D,»(x). 
«0 e—0 


Tf Dv(x) = Dv(x) < ©, then v is differentiable at x, and its derivative 
Dy(x) ts the common value. 


Exercise 1 ‘This definition of Dy(x) is consistent with the one in Section 1. 


Remark 1 Closed balls, cubes, or various other sets can be used in place of the open balls, 
but the latter suit our present purposes best. The proofs are not very different 
in these other approaches. 


Remark 2 Itisclear thatife < 7, then Dv(x) < D1y(x) and D.v(x) > D,v(x). Therefore, 


Dr oe— int DG) and Dv(x) = sup D.v(x). 
e>0 e>0 

This has two important consequences. One is that the upper and lower deriva- 

tives Dv and Dy do exist at every point, at least if + © is allowed as a value. 

The second is that it is not necessary to consider all e > 0. Any sequence 

which approaches 0 will do. 


THEOREM Dv and Dv are measurable functions not only with respect to Lebesgue measure, 
2.2 but with respect to any regular Borel measure. 
Proof By the second part of Remark 2 it is enough to show that D‘v and D,y are 


measurable. If Dv(a) > a, then a belongs to an open ball B of radius 
<e such that »(B/\Q)/|B| > a. Since the ball is open, ‘any point x 
sufficiently close to a belongs to the same ball and, therefore, also satisfies 
D‘y(x) > a. In other words, the set {x:D*v(x) > a} is an open set for 


356 


14/ differentiation 


Remark 3 


DEFINITION 
2.3 


THEOREM 
2.4 


Exercise 2 


Proof 


each real a; hence, its complement {x:D‘v(x) < a} isa closed set for each 
real a. It follows from Theorem 5.6 of Chapter 13 that D‘y is measurable 
with respect to any regular Borel measure yw, for each closed set is u 
measurable. D,»v is handled similarly. 


What we have actually shown is that D‘v is lower semicontinuous on 2 and that 
D.v is upper semicontinuous on 2. There is one point that might not have been 
apparent in the proof. For the validity of the statement that {x:D‘y(x) > a} 
is open, it is necessary to interpret D‘v(x) as + © at the points where the defini- 
tion makes it natural todo so. On the other hand, the value + © was not con- 
sidered in the discussion of measurable functions. This is why the final 
deduction from Theorem 5.6 is based on the fact that the complement 
{x:Dév(x) <a} is closed. Later it will be important to know that Dy and Dy 
are vy measurable as well as Lebesgue measurable. 


The study of regular Borel measures rests mainly on an important technical 
result called the Vitali covering theorem. 


A family & of balls covers a set Ein the sense of Vitali tf for every x € E and 
every € > O there is some ball in & that contains x and has radius <e. 


(Vitali’s Covering Theorem) Let & be a family of balls that covers a set 
E in the sense of Vitali. Then & contains a disjoint sequence of balls that 
covers almost all of E. 


The theorem does look technical, but the power of it will be apparent 
shortly. The true Vitali theorem is somewhat more general, but this version 
will suffice. It does not matter whether the balls are open or closed. In the 
proof we shall treat a family of closed balls; but if & is a family of open balls, 
then the corresponding family $ of closed balls also covers in the sense of Vitali. 
Therefore, it contains a disjoint sequence {B,} that covers almost all of E. 
Then also the disjoint sequence { B;} covers almost all of E, because |B — B| = 0 
for any ball B. 


Prove the last statement. 


To begin with we shall suppose that E is bounded. Let G be a bounded 
open set containing £, and throw out all the balls in & that are not con- 
.tained in G. The new family, which we will still call 5, continues to 
cover £ in the sense of Vitali. 

The disjoint sequence is defined as follows: Let M, be the upper 
bound of the radii of all balls in §, and choose a ball By = Baym) in F 


differentiability theorems 357 


with r,; > M,/2. Assuming that Bi, ..., B, and My,..., M;, are 
already defined, let M41 be the upper bound of the radii of all the balls 
in & that are disjoint from B,, . . . , By, and then choose such a ball 
Bri = Bangs; rega) With raga > Mayi/2. (If My41 turns out to be 0, then 
rea = 0 and By41 is defined to be empty.) It is clear that the sequence 
{M,,} is decreasing; furthermore, it converges to 0, for My < 2r;, and the 
sequence {7;.} converges to 0 because of the fact that |B,| = cr7, while 


>, |Bl < IG] < @. (1) 
k=1 
Suppose that a point x € E£ is not in any of the balls By, . . . , Buy 


Then it is at a positive distance 6 from their union (which is compact). 
Choose a ball B(a; r) in ¥ that contains x and has radiusr < 6/2. Then 


B(a; 1) is disjoint from Bi, . . . , B,, because every point of B(a; 1) is at 
distance <6 from x. Conscauenty, 
Miyp 2 1. (2) 


Since {M,,} decreases to 0, there is an index j such that 
M; ee r> Myj41 and J > ko. (3) 


Since r > My41, it follows that B(a;r) meets one of the sets Bi, 
. . , Bj, say Bm, and necessarily m > ko. Hence, |x — an| < 2r + 1m, 
mwliile r <M; < Mn < 2rm. Thus, |x — aml < 5rm, and we have estab- 


lished the following Se 


1B = U By Cc ww Bax, tk); (4) 


k>ko 
and hence the inequality 


ko oO oo 
Pat B,| < » |B(a; 5rx)| = 5” » |B, |. (5) 
k=1 aiebil k=ko+1 


If ko is large enough, the number on the right is as small as we please by 
virtue of (1), so |E — Ug_1 By| = 0, as required. 
This takes care of the case when F is bounded. Now let E be 
arbitrary. Let 
Cy = ep — 1 & || ae 


let EF, = E(\G,, and let §, consist of the balls in ¥ that are contained 
in G,. It is plain that 5, covers /, in the sense of Vitali, so there is a 
disjoint sequence {B,,} in §, that covers almost all of E,. Whenn and k 
both vary, the B,; form a disjoint sequence in & that covers almost all of 
VE,, and this is almost all of E. 


358 14/ differentiation 
Exercise 3. Prove the last statement. 


Now, let us see the use of the Vitali theorem. 


LEMMA Tf Dv(x) > a ae. ona set E, then 
2.5 
v(E) > alE|. 
Proof It can be assumed that Dy(x) > a everywhere on £; for if the lemma is 


true in this case and if 


E' = {x:x € Eand Dy(x) > a}, 
then we have 
v(E) > v(E’) > alE’| = aE]. 


Take any 8 < wand any open G 2 E, and let & be the family of all 
open balls that are contained in G and satisfy 


v(B) 


IB | a. (6) 


If x is any point of E, and « is any positive number, then by the definition 
of Dv(x) there is a ball of radius <e that contains x and satisfies (6). This 
means that the family * covers E in the sense of Vitali. If {By} is a 
disjoint sequence in § that covers almost all of , then 


v(G) > Dv(Bi) > 6Z|Br| = BIEI. 


Since this is true for every open G > E, it follows that »(E) = B|E|; and 
since this is true for every 6 < a, it follows that »y (E) > a |Z]. 


Exercise 4 Use the lemma to show that Dy(x) < © ae. 


We need a corrcsponding lemma for the lower derivative but have to be 
satisfied with something a little weaker. 


LEMMA If Dv(x) < aa.e. ona set E, then there is a set N of measure O such that 
2,0 
v(E — N) < o£]. 
Example Take » to be a counting measure on R', say v(A) = number of integers in A. 
It is evident that if x is not an integer, then Dv(x) = Dv(x) = 0. The set V in 
the lemma is the set of integers. What happens with the Cantor measure? 


Proof 


Exercise 5 


THEOREM 
Leal 


Proof 


differentiability theorems 359 


It can be assumed that Dy(x) < @ everywhere on E£, for this just involves 
throwing out another set of measure 0. It can also be assumed that E is 
measurable, for if EZ’ is a measurable set containing E and with the same 
measure and if F = {x:Dv(x) < a}, then F is also a measurable set con- 
taining E. If we apply the lemma to the measurable set E’ () F, we get 


vV(E — N) < V((E'O F) — N) < ol E’ OF = of EI. 


Finally, it can be assumed that F is bounded, for if the lemma holds for 
each E'() B(0; n), then it also holds for E. 


Prove this last statement. 


Let « > 0 be given and choose an open G D E with |G| < |E| +«. Take 
any 8 > a and let § be the family of open balls contained in G and satis- 
fying »(B)/|B| < B. As before, & covers E in the sense of Vitali. Let 
{B,} be a disjoint sequence that covers almost all of E. If Fg is the union 
of the B, and Ng = E — Fg, then 


v(E — Ng) = v(ECY Fa) < v(Fo) 
2v(Bi) S BZ|Bi| < BIG] < BE] + ©). 


Now, let N’ be the union of the Ng for 8 = a + 1/m, where m is a posi- 
tive integer. Each Ng has measure 0, so N’ doesalso, and v(E — N’) < 
v(E — Ng) < B(|E| + €). Since this holds for each 8, it follows that 
VE aN es a( || 4. €). 


The set N’ depends on the initial choice of e. Form N, for « = 1/k, and let 
N be the union of the N,;. Then N also has measure 0, and »(E — IN) < 
v(E — N,) < a(|E| + 1/&) for every k, which proves the lemma. 


Every regular Borel measure is differentiable a.e. 


It is plain that Dv(x) < Dv(x), and it has been seen already in Exercise 4 
that Dy is finitc a.e., so what has to be proved is that the set 


B= xs DyG) <— Dy(x)) 
has measure 0. Fix any two numbers @ and 8, a < 8, and let 
F = {x:Dv(x) <a < B < Dv(x)} MO BO; m). 


By Lemma 2.6 there is a set V of measure 0 such that »(F — N) < alF\, 
while, by Lemma 2.6, »(F — N) > B|F — N| = B|F|. Since a < 8, it 
follows that |F| = 0. Now, E& is the union of the sets F with rational a 


and f, and integral m, and these can be arranged in a sequence. Hence 
| = 0. 


360 14/ differentiation 


Exercise 6 Expand the idea of the last proof to prove the following theorem. 


THEOREM 
2.8 


3 


THEOREM 
3.1 


LEMMA 
3.2 


Proof 


If {vy} is an increasing sequence of regular Borel measures with limit v, then 
v 1s a regular Borel measure and Dv;,— Dv a.e. 


[The hypothesis means that for every set E, the sequence {»;,(£)} is increasing 
and has limit »(E) < o«.] 


INTEGRATION OF DERIVATIVES 


The main theorem on the integration of derivatives is the following: 


If v is a regular Borel measure on Q, then there is a Gy set N of measure 0 
such that for every measurable set E C Q, 


nah ic Dye ey N): (1) 
We begin with a lemma. 


For every measurable set E, we have 


vV(E) = [Dy dx. (2) 


It can be assumed that £ is open; for if the inequality holds in this case, 
then for every open G . E we have 


v(G) > iL Dy dx = [D> dx, 


and taking the lower bound over such G gives (2). In the second place, 
it can be assumed that E is bounded; for if (2) holds for each E (\ B(O; n), 
then it holds for £ itself. As a consequence of these two remarks, it can 
be assumed that EF is bounded and both Lebesgue measurable and vy 
measurable. 

Let e > 0 be given and set 


E; = {x:x © Eand je < Dv(x) < G+ 1)e}. (3) 


Note that £; is both Lebesgue measurable and vy measurable. Indeed, 
E is measurable in both senses, and so is Dy, by virtue of Theorem 2.2. 


. According to Lemma 2.5, we have »(E£;) = je\E;|, and obviously we have 


fu, Dv dx < (j + 1)eE|; combining the two we get 


v(Ej) = iE Dv dx — ¢|E}l. 


Exercise 1 


Exercise 2 


Proof of the Theorem 


integration of derivatives 361 
Summing over j (and this is where £; needs to be v measurable) gives 
o(E) > [,, Dy dx — eB] = [, Dv dx — EI, 
which in turn gives (2), since ¢ is arbitrary. 


The lemma is also true if E is y measurable. (The main point is that if E is v 
measurable, then, although it is not necessarily Lebesgue measurable, xg Dy is 
Lebesgue measurable. This fact follows from Lemma 3.2.) 


What is the reason for using Dy instead of Dy in formula (3)? 


To begin with, let us take E to be bounded and find a corresponding set N 
that depends on EF. Let « > 0 be given, and define £; as in the proof of 
the lemma, formula (3). According to Lemma 2.6, there is a set N; of 
measure 0 such that »(E; — Nj) < (j + 1)¢|Ej|, while obviously fz, Dv dx > 
je|E;|; combining the two we get 


(HRN ee ie yn dee Ae Na 
Taking , to be the union of the N; and summing on j, we get 
WI = = iL Dy dy + lB, 
and, taking N to be the union of the NV, for « = 1/k, we get 
(E — N) < f, Dv ar. (4) 


Now we shall show that the same set N works for any subset F of E 
that is both Lebesgue measurable and » measurable. Formula (4) and 
Lemma 3.2 give 


[, Dv dx > (E — N) = Die = ys Ua Pa 


> |, Dvds + Dy dx = [, Dy dx. 


E~F-—N 
Consequently, equality holds throughout, so that 
WF —N) = fy Dv de = ff, Dv ae. (5) 
Now, let &;, be the ring 
Ey, = {x:k < |x] << k +1}, 


which is clearly both Lebesgue measurable and v measurable, let N, be the 
set that has just been found for EF = FE, ( Q, and let N be a G; of measure 0 


362 


14/ differentiation 


Exercise 3 


DEFINITION 
3.3 


THEOREM 
3.4 


that contains the union of the N;,. If F is both Lebesgue measurable and 
vy measurable, then so is F(\ E;, and by (5) we have 


nee = ho > A ey = ae ie dhe = i Dude 


Finally, if E is any Lebesgue measurable set, let ¥ be a Gs that contains 
it and satisfies |F — E| = 0. Then 


UP = NY hp < [Dvds = f,, Dvds. 


On account of Lemma 3.2, equality must hold throughout; therefore, as 
N is v measurable, 


(BE) = (E ~ N) + (EON) = [, Dvdx + (END), 
which proves the theorem. 


Formula (1) holds when E is » measurable as well as when E is Lebesgue 
measurable. (As in Exercise 1, the point is that xzDv is Lebesgue measurable, 
even though xz itself is not.) 


Theorem 3.1 gives an immediate solution to the problem of which regular 
Borel measures are indefinite integrals. They are those which are absolutely 
continuous in the following sense. 


A regular Borel measure v is absolutely continuous if v(E) = 0 whenever 


lel = 0. 


A regular Borel measure v on Q is an indefinite integral if and only if wt ts 
absolutely continuous. If this is the case, then for every v-measurable set 


E C Q we have 
v(E) = [,D» dx, 


and for every nonnegative v-measurable function f we have 


ip dp — Da 


It is plain on the face of it that if v is an indefinite integral, then it is absolutely 
continuous; and it is plain from Theorem 3.1 that if » is absolutely continuous, 
then it is the indefinite integral of Dv, for the term »(£ N) drops out. The 
last assertion comes from Theorem 1.3. Note that Theorem 1.5 now is proved 


completely. 


COROLLARY 
3.5 


Exercise 4 


Exercise 5 


Exercise 6 


Exercise 7 


Exercise 8 


Exercise 9 


Exercise 10 


Remark 


integration of derivatives 363 
If v is the indefinite integral of f, then Dv = f a.e. on Q. 


Prove the corollary. (See Exercise 5 of Section 1.) 


If v is absolutely continuous, then every Lebesgue measurable set or function is 
v measurable. 


A regular Borel measure y is absolutely continuous if and only if it has the 
following property: For every set F, with »(F) < ©, and every « > 0, there is a 
6 > 0 such that if EC F and |E| < 4, then »(E) < e. 


Interpret the property in Exercise 6 for increasing functions on R!. 


For any set A C R’*, set va(E) = |A(\ E|. Show that Dva(x) = 1 a.e. on A 
and that if A is measurable, then also Dvg(x) = 0 a.e. on R* — A. [Hint: 
If A is measurable, then v4 is the indefinite integral of the characteristic function 
of A. If Ais not measurable, find a measurable B > A such that vp(E) = v4(E) 
for every measurable E. Why doesn’t the argument show that Dva(x) = 0 
a.e.on R* — A?] If Dva(x) = 1, then «x is called a point of density of the set A. 


If f:R” — R” is locally integrable, then for almost every point x 


wt = 
tim 7 f Ife) — fol = 0 (6 


the limit being taken over the balls that contain x. [Hint: Suppose that the 
limit superior in (6) is >a > Oona set E of positive measure. Choose a small e 
and let E; = {x € Evje < f(x) < GG + 1)e}. Some &; must have positive 
measure, and hence by Exercise 8 a point of density. Show that the limit 
superior cannot be >a at a point of density of £; at which corollary 3.5 holds 
for |f|.] The points where (6) holds are called the Lebesgue points of the 
function f. 


The Lebesgue points of a function are important in many theorems, such 
as the following. 


Let ¢:R"— R! be bounded and measurable and vanish for |x| > 1. Let 
fe(x) dx = 1 and set e,(x) = p-"e(x). Show that if f is locally integrable, then 
f * e9(x) — f(x) at each Lebesgue point of f. (This sort of thing is useful in 
connection with the process of regularization described in Chapter 13.) 


The theorems of these first three sections can be carried through in an abstract 
setting where the space R” is replaced by a metric space X, and the Lebesgue 


364 


14/ differentiation 


Exercise 11 


A 


THEOREM 
4.1 


Proof 


THEOREM 
4.2 


measure is replaced by some basic regular Borel measure » which satisfies the 
following conditions: 

A. There is.aconstant c such that p(B(a; 2r)) < cu(B(a; r)) for every ball B(a; r). 

B. If {By} ts a sequence of balls with w(By) — 0, then 6(By,) — 0. 

C. If {Bu} is a sequence of balls with 5(Br) > ©, then p(By) > ©. 

The results are particularly interesting when X is a smooth n-dimensional 
manifold and p» is the Hausdorff area measure, which is the subject of the next 
chapter. They also show that the metric in R” can be changed so that balls 
become cubes, for example, while the theorems remain the same. 


Go back through the chapter in the light of this remark. (Almost everything 
will remain unchanged.) 


CHANGE OF VARIABLE 


In this section we shall carry out the details of the change of variable formula 


fom 10) 9 = f, Fle@))Idet do(x)| dx (1) 


that was discussed somewhat briefly in Section 1. The ideas in the proof will 
appear again in the next chapter in a more elaborate form, and the theorem 
itself will be improved. To begin with, let us improve Theorem 2.7. 


If ¢:R" — R® satisfies |o(x) — o(y)| < M|x — y| on a set A, then 
le(A)| < MA]. 


Clearly we can assume that |A| < ©. Lete > 0 be given and choose an 
open G > A such that [G| < |A| +. Let & be the family of balls that 
are contained in G and have center in A. This family covers A in the 
sense of Vitali, so there is a disjoint sequence {B,} that covers almost all 
of A. By Theorem 2.7, ¢ takes sets of measure 0 into sets of measure 0, 
so we have 


le(A)| < Zle(AN B,)I. 


Now, if B = B(a;r) isa ball with center in A, then g(4M B) C B(g(a); Mr); 
therefore, |e(A O B)| < M*|Bl, so 


|o(A)| < Mnz|Bi| < M*|G| < Mr(\A| + ©). 


If T:R" — R* is a linear transformation, then for any set A 
[7(A)| = [det T| [4]. 


Proof 


THEOREM 
4.3 


Proof 


change of variable 305 


If U isan orthogonal transformation, then the previous theorem applies with 
M = 1 to give |U(A)| < [A]. It also applies to U-! to give |U(A)| > | A]. 
This proves the theorem when U is orthogonal, for the determinant 
of an orthogonal transformation is +1. It also proves the interesting 
fact that the Lebesgue measure in R” is independent of the (orthonormal) 
coordinate system, for a coordinate change is effected by an orthogonal 
transformation. 

Let H be self-adjoint, and choose the coordinate axes along the 
eigenvectors of H (which can be done without changing the measures by 
the previous part of the proof). Then Exercise 13, Section 2 of Chapter 
13, gives 

Lieve tye © ou) 4) = heer 
If 7 is nonsingular, then T = UH, so 
|T(A)] = |det U] |H(A)| = |det U| |det H] [4] = |det T/A]. 


If T is singular, then det T = 0 and |T(A)| = 0, because the range of T 
is contained in a subspace of dimension <n. 


If ¢:R*® — R* is of class C" at a point a, then 


Fin |y(B(a; r))| 


me IB (a; | = |det dy(a)|. 


We begin with the case where dy(a) = I. In this case Theorem 7.2 of 
Chapter 10 shows that 


IBe@; (1 + &)r)| = |e(B@ 9) = |Be@; 1 — &r)| 
ifr is small enough. Dividing by |B(a; r)| we get 
lo(B@: 7))| 
|B(a; 7)| 

Letting r — 0 we see that the limits superior and inferior are both between 
(1 + «)” and (1 —e)". Since « > 0 is arbitrary, this implies that the 
limit exists and is equal to 1. 

Next suppose that T = dy(a) is nonsingular, and put y = T-!o¢so 
that g = Toy and d(a2) = I. By Theorem 4.2 and what has just been 
proved, we have 


Bia; Ba; 
ive Wel |det 7] lim wie) 
r0 |B(a; r)| r—+0 |B(a; r)| 
Finally, suppose that T = dg(a) is singular. From the definition of 
the differential we have 


lo(x) — e(a) — T(x — a)| < elx — al if |x — al <r 


(Clea zz Zac) e 


= |det 7]. 


366 


14/ differentiation 


withr small enough. If we choose the coordinates in R* so that the range 
of T is contained in R"—! and use primes to denote the first 2 — 1 coordi- 
nates, then we get 


le’ (x). — ¢'(a)| < (M + ©)|x — al, 


and 
len(x) — en(a)| S «lx — a 
with M = ||7||; hence 
y(B(a;r)) C BY(e'(a); (M + 1) X [en(@) — «r, gala) + er]. 
Fubini’s theorem gives 
lo(B(a;r))| < c'(M + ler < 2c! (M + 1)™ Ir" 
with c’ = B’(0; 1); consequently, 


lim #(B@)| _ 0 
0 Ba; r)| 


which proves the theorem. 


Now we are ready to change variables. If g:R"— R” is continuous and 
one to one on the open set 2, the discussion in Section 1 shows that 


v(A) = |e(A)| 


is a regular Borel measure on @ such that 


en f0) & = [, f(e@)) dv 


for every Lebesgue measurable function f on g(@). If » is absolutely continuous 
(and by Corollary 2.10 of Chapter 13 this is the case if g is of class C'), then 
Theorem 3.4 gives 

10) & = [, fle) Do) ax. 


e(Q) 


Finally, if g is of class C! at almost every point, Theorem 4.3 shows that Dv(x) = 
|\det dp(x)| almost everywhere. 

As far as the absolute continuity of y is concerned, it is a little unsatisfactory 
to achieve this by assuming that ¢ is of class C’ at all points. For instance, 
g(x) = Wx is of class C! at cach point x ~ 0, but not of class C'at 0. The 
corresponding » is certainly absolutely continuous, and ¢ makes a perfectly good 
change of variable. In order to state the final theorem on change of variable 
we shall just define away this problem—but then we shall make some remarks 
on the definition afterward. 


DEFINITION 
4.4 


THEOREM 
4.5 


Exercise 1 


Exercise 2 


Exercise 3 


Remark 


change of variable 367 


A function g:R" + R” ts absolutely continuous on a set A if y is continuous 
and ~(N) has measure 0 whenever N C A has measure 0. 


The final theorem reads: 


Tf g:R”— R° ts one to one and absolutely continuous on the open set Q and 
ts of class C at almost every point of Q, then 


oe FQ) dy = i f(e(x))|det do(x)| dx 


for every Lebesgue measurable function f. 


The usual way to establish absolute continuity is to decompose the set Q 
into two parts such that ¢ is of class C! on one part and takes the other part into 
a set of measure 0. 


If y is continuous on A VL B and absolutely continuous on A and B separately, 
then ¢ is absolutely continuous on A U B. 


If y is continuous on A and |y(A)| = 0, then ¢ is absolutely continuous on A. 


Let ¢ be continuous on A and suppose that there is a subset N such that 
|e(N)| = 0 and g is of class C'on A — N. Then g is absolutely continuous 
on A. 


The result of the last exercise is the one that usually is used to prove 


absolute continuity. In the example g(x) = WV x, for instance, it applies with 
N = {0}. Itisimportant to realize that (NV) must have measure 0. Whether 
N has measure Oisirrelevant. Ifa is the Cantor function described in Section 1, 
then a is of class C’ on the set G (which means almost everywhere), but it is 
not absolutely continuous. To get a similar example where the function g is 
one to one, just take g(x) = a(x) + x. 


There is no standard definition of absolute continuity except in the one- 
dimensional case where the function ¢ is defined on an interval. Unfortunately, 
the standard definition in this case is a little bit different from the one above. 
It requires in addition that the curve y = g(x) have finite length on every 
compact subinterval. For example, the function g(x) = x sin 1/x is absolutely 
continuous in our sense but not in the classical sense. The term originated in 
the study of increasing functions and differences of increasing functions. In 
this setting the finiteness of the length is automatic, but it is not a natural 
condition to impose in general. In Chapter 15 we shall study absolutely con- 
tinuous functions from R™ to R* for any m < n. 


368 14/ differentiation 


J 


THEOREM 
5.1 


Exercise 1 


Exercise 2 


Exercise 3 


Exercise 4 


Proof of the Theorem 


DIFFERENTIABILITY OF LIPSCHITZ FUNCTIONS 


H. Rademacher found a fundamental theorem on the differentiability of 
Lipschitz functions. 


(H. Rademacher) If f:R"— R™ 1s locally Lipschitzian on an open set , 
then f is differentiable almost everywhere on Q. 


Actually he proved a bit more, but this will suffice. Before taking up the 
proof we gather a few preliminaries. 

Since a function is differentiable if and only if each coordinate function 
is, we can assume that m = 1. Since differentiability is local, we can assume 
that f is Lipschitz, not just locally Lipschitz, and that © is a cube. Thus, 
fis real-valued, © is a cube, and 


iG nO) era (1) 


Prove the theorem when n = | (in which case it is a much earlier theorem of 
Lebesgue). Hint: if fis nondecreasing, this follows from Exercise 3, Section 1, 
and Theorem 2.7. In general, f is a difference of nondecreasing Lipschitz 
functions. 


For each 6, Def exists almost everywhere and is measurable. In particular, 


Eee) ea (2) 
almost everywhere on Q. 
This will be combined with the following. 


(Egoroff’s theorem) Let » be a finite measure on a set X. Let g;, be defined 
a.e. and measurable, and let g,—> g a.e. For A > 0 there is a measurable set 
K, with p(X — K) <A, on which the convergence is uniform. If X is a subset 
of R®, K can be taken compact. 


The set A of points a = (a’, a) such that the section /;, is differentiable at a’ 
is measurable. (Recall that f;,(x’) = /(%’; an).) 


Assume, for purposes of induction, that the theorem holds in dimension 
n—1. (The case n = 1 is covered by Exercise 1.) Let A > 0 be given, 
.and use (2) and Egoroff’s theorem to find a compact K, |Q — Kj <A, 
on which the convergence in (2) is uniform. Then D,f is continuous, 
hence uniformly continuous, on K. We will show that fis differentiable 
almost everywhere on K. Then we can take a sequence A,,—>0 and 


LEMMA 
5.2 


Proof of the Lemma 


LEMMA 


5.3 


Proof of the Lemma 


LEMMA 
5.4 


Proof of the Lemma 


Back to Proof of the 
Theorem 


differentiability of Lipschitz functions 369 


conclude that f is differentiable almost everywhere on the union of the 
corresponding K,,, hence almost everywhere on Q. 

Let D be the set of points of density of K,and let A be the set described 
in Exercise 4. According to Exercise 8, Section 3, K — D has measure 0. 
According to the induction hypothesis, each section of Q — A has (n—1)- 
dimensional measure 0, so by Exercise 4 and Fubini, 2 — A has measure 
0. Therefore, K — (A (\D) has measure 0, and we will show that / is 
differentiable at each point a of A (\ D. Henceforth, a is fixed in A () D. 


For each « there is a 8' such that ify © K and|y — a| < 8’, then 
f(s In) —F(9"> 4n) — Daf (a)On — 4n)| S ely — al. (4) 


Set h = a, — yy. By the uniform convergence in (2), there is a 8’ such 
that (4) holds if D,f(a) is replaced by D,f(y). By the uniform con- 
tinuity of D,f on K, this replacement makes no difference. 


For each « there is a 8’ such that if y © K and |y — a| < 8’, then 
f(r) —f(a) — WF(a), 9 — a)| < ely — al. 


Write the left side in the form 


f(s In) —F(9's Gn) — Daf (@)(In — an) 
iene Ge) = eon 
The first term is covered by Lemma 5.2, the second by the fact that the 
section f;, is differentiable at a’. 


For each ¢ there is a 8’ such that if |x — a| < 8’, then 
B(x, lx — al) NK 4 SQ, 
Letr = (1 + ¢)|x — |. If the intersection is empty, then 
\B(a, r)| > |B(a, r) AKI + iB, lx — al)| 


so that 


|B(a, r) VK] 28 
' = TRG, FI ear 


which is impossible for small 7, since a is a point of density of K. 


Let « > 0 be given, let 6’ be chosen in accordance with Lemmas 5.3 
and 5.4, and let §< 8/(1 +c). If |x —a| <6, use Lemma 5.4 to 
choose y © K with |y — x| < e|x — al, so that 


Pe — al (1 ix — a| < O. 


370 14/differentiation 


Then we have 
Lf@) —F la) — f(a), « — a) 
<I) — FO) + KVF(@), * — 9) +1F0) —F£(@) — WF (@),9 — | 
<2MIx — yl + ely — al 
< 2Me|x — al + (1 + €)|x —a 
< (2M + 1 + ejelx — al, 
which shows that fis differentiable at a. 


is 


1 


THEOREM 
1.1] 


Exercise 1 


Proof of the Theorem 


Surface Area 


AREA MEASURES 


In this chapter we shall develop formulas for the m-dimensional area of sets in 
R" and for the area of m-dimensional parametric surfaces g:R™— R”. It is 
assumed of course that m <n. When m = n the theory is just the theory of 
Lebesgue measure, and the formula for the area of the parametric surface ¢ 1s 
just the change of variable formula of the last section (with f = 1): 


lo(@)| =f, Idet doe)| ax. 


(In this case, of course, we do not ordinarily think of area but rather of volume, 
and we think of ¢ as a solid rather than a surface. When m <n it is more 
natural to think of surfaces and areas.) 

When m < n the right way to define the m-dimensional area of a set is to 
use the m-dimensional Hausdorff measure yu» that was introduced in Section 3 
of Chapter 13. The reason for this is the following theorem. 


For each m there is a constant Ym such that 
Bm(A) = ¥m| Alm if AG RR, 


Go back to Section 3 of Chapter 13 to review the definition and elementary 
properties of the Hausdorff measures. Then show that yw, is an absolutely 
continuous regular Borel measure on R™. (It is not true that y,, is a regular 
Borel measure on R* if m <n, for compact sets do not have finite measure.) 


Since yp, is an absolutely continuous regular Borel measure, Theorem 3.4 
of the last chapter shows that it is the integral of its derivative. But the 
derivative is plainly constant, for un(E + a) = un(E) for any set E and 
any point a. Thus, ym is just the constant Dun. 


372 


15/ surface area 


DEFINITION 
1.2 


Exercise 2 


Exercise 3 


Exercise 4 


THEOREM 
1.3 


Exercise 5 


COROLLARY 
1.4 


In any metric space we shall write 
1 
|A]m = @m(A) = = Hm(A) 


and shall call cm the m-dimenstonal area measure on the metric space. 


In order to see that this definition is acceptable, suppose first that A C R” 
is a subset of an m-dimensional subspace V of R*. Any choice of an orthonormal 
basis of V turns V into R™ and determines, therefore, a Lebesgue measure on V 
which gives the natural m-dimensional area of A. Theorem 1.1 shows that this 
is just @m(A). 


How does Exercise 8, Section 3 of Chapter 13, come in here? 


It is plain that the m-dimensional area of a set A ought to be the same as 
that of the translate A — a, and it is plain that a,, has this property too. There- 
fore, @m(A) is the right thing if A is contained in any m-dimensional plane. 
Finally, if A is contained in a finite union of distinct m-dimensional planes II,, 
then the m-dimensional area of A ought to be the sum of areas of the A II;. 


Show that if A is contained in the union of the m-dimensional planes II;, then 
Om(A) = Zan(A O II). 


(Even for a sequence, as a matter of fact.) (Hint: You will have to show that 
each II; is a», measurable and that the intersections II; ™ HI, have measure 0.) 


This shows that an(A) does coincide with the intuitive notion of the m- 
dimensional area whenever A is contained in a finite union of planes of dimension 


m—which is about as far as intuition carries. 


The value of the constant ym is not important, but it is an interesting exercise 
to calculate it. 


If o:X— ¥ satisfies d(y(x), o(y)) < Md(x, y), then for any set AC X we have 
[e(A)|m < M™|A]m. 
Prove the theorem. 


Any compact subset of a smooth m-dimensional manifold has finite m-dimen- 
stonal area. 


Exercise 6 


COROLLARY 
1.5 


THEOREM 
1.6 


Proof 


DEFINITION 
1.7 


area measures 373 


Prove the corollary, and also the following one. 
Any smooth manifold of dimension <m has m-dimensional area 0. 
Now we shall develop some formulas for the area of polyhedra. 


If T:R™— R* is a linear transformation, then 


IT(A) lm = Vdet T*T |A|m- 


If T is not one to one, then the left side is 0 because the range of Tis a 
subspace of dimension <m; the right side is 0 because T*T is not one to 
one either. If 7 is one to one, then we can write T = UH, where H = 
V 7T*T is self-adjoint from R™ to R™, and U = TH is orthogonal from 
R™ to R*. 

Now, if U:R™-— R” is orthogonal, then the sets E and U(E) have 
precisely the same diameter, so it is clear that 


|U(A)|m = |Alm. 
Therefore, by Theorem 4.2 of Chapter 14, we have 
|T(A)|m = |H(A)|_ = det AIA], 
and since H? = 7*T, we have (det H)? = det 7*T. 


Let vo, . . - 5 Um be affinely independent points in R®. The m-dimensional 
simplex with vertices Uo, . . . , Umis the set 


g= [es = > t;0; withO < t; < 1 and ) a i}. 
i=0 


AS 


Recall (Section 2 of Chapter 9) that vo, . . . , Ym are affinely independent 
if and only if v1 — vo, . . . , Um — vo are linearly independent. When m = 1 
the simplex is simply the line segment with vertices v9 and 41. When m = 2 itis 
the triangle with vertices vo, v1, and v2. When m = 3 it is the tetrahedron with 
vertices vo, V1, ¥2, and v3. And so on. 

Consider first the unit m-dimensional simplex ¢,, in R”, that is, the one with 
vertices 0, ¢1, . . . , ém. It is easy to see that the section through the point 

= tis given by 
(om)t = (1 = Donat 


Therefore, Fubini’s theorem gives 


\omlm = i [(m)¢ v= “n= lom—1| i (1 — f)") AS racine, (1) 


374 


15/ surface area 


Exercise 7 


THEOREM 
1.8 


Exercise 8 


THEOREM 
1.9 


and then induction gives 


1 


lomlm = aa (2) 


In formula (1) we used the fact that |(1 — om—alm—1 = (1 — 0)" lom—1|m—1. 
This follows from Theorem 1.6 by considering the linear transformation T: 
R™"!— R™' given by Tx = px. However, the same idea has been used 
before back in Exercise 13, Section 2 of Chapter 13. 

Now consider an arbitrary o with vertices uo, . . . , um. If T:R™— R” is 
the linear transformation with Te; = v; — vo, then Tom) = o — vo; therefore, 


ee 1 ak ee 
dh = lp aa, = SE eS ee 
mi! 


The matrix of T has the coordinates of Te; = vj — vo down the jth column. 
Therefore, the matrix of 7* has the coordinates of v; — vp along the :th row. 
Consequently, the matrix of 7*T has (v; — vo, vj — vo) in the ith row, jth 
column, and we get the following result: 


If o is the m-dimensional simplex in R” with vertices vo, . . . , Um, then 


le = — V det {(vz; — v0, vj — vo)}. 


Find the area of the triangle in R* with vertices (1, 0, 0), (0, 1, 0), and (0, 0, 1). 


When m = n we can use T directly instead of T*T, for det 7*T = |det T|?. 
If we write (#1, . . . , Wm) for the matrix that has the coordinates of w; down 
the first column, those of w2 down the second column, and so on, then we get 
the following: 


If o is the m-dimensional simplex in R™ with vertices vo, . . » , Um, then 
1 
lol, = ae |det(v1 — vo, . . . » Um — Vo)|. 
m! 


Still in the same situation (that is, m = n), there is another interesting 
formula that is obtained in the following way: Letzr be the (m + 1)-dimensional 
simplex in R™+! with vertices 0, (1, vo), . . . , (1, um), where (1, v) is the vector 
with first coordinate 1 and the rest equal to those of v. What we are doing here 
is moving g up to the plane x9 = 1 and then joining it to0. For the section 
through the point x» = ¢, we have 


T, = lo; 


THEOREM 
1.10 


DEFINITION 
1.11 


DEFINITION 
1.12 


Remark 1 


Remark 2 


Remark 3 


area measures 375 


therefore, 
; lolm 
m+l = fo. 
lees = feel dt = 
Using Theorem 1.9 to calculate {r}41 we get 
If o 1s the m-dimensional simplex in R™ with vertices vo, . . . , Um, then 


det fea. : 
Uo Um 


(The notation means that the first column of the matrix is composed of a 1, and 
then the coordinates of vo, and so on.) 

The theorems on simplexes lead immediately to the possibility of calcu- 
lating the area of any polyhedron—just because of the definition of the term 
polyhedron. 


lol, = — 
mi 


A face of the simplex with vertices vo, . . . » Um is a simplex whose vertices 
are among these. 


An m-dimensional polyhedron 1s a finite union of simplexes of dimension m 
such that the intersection of any two ts either empty or is a face of both. 


If o and 7 are simplexes in a polyhedron of dimension m, then ¢/\r is 
contained in a plane of dimension m — 1, so j¢(\7|m = 0. Therefore, the 
m-dimensional area of the polyhedron is the sum of the m-dimensional areas of 
the simplexes of which it is a union. 


A given polyhedron may be cut up into simplexes in many different ways, but 
the argument above shows that the sum of the areas always remains the same— 
the area of the polyhedron itself. 


In the definition of a polyhedron it is not really necessary to insist that the 
intersection of any two simplexes be either empty or a face of both. The reason 
is that the union of any two simplexes can be cut into smaller simplexes that do 
satisfy this condition. We shall not go into this, however. 


Often a polyhedron of dimension m is defined to be a union of simplexes of 
dimension <m, with at least one of dimension = m. Since a simplex of @imen- 
sion <m has m-dimensional area 0, the ones of dimension <m do not enter the 
picture. (That is, in area problems they do not. In other kinds of problems 
they do.) 


376 


15/surface area 


2 


DEFINITION 
2.1 


PARAMETRIC SURFACES—INTRODUCTORY REMARKS 


An m-dimensional parametric surface in R* is a continuous function ¢:R™ — R* 
defined on some set E C R™. The surface is not the set ¢(£), but if g is one to 
one, then it is natural to define the m-dimensional area of g to be the m-dimen- 
sional area of the set g(Z). Ifg is not one to one, this will notdo. For instance, 
the path to the dentist’s office and home again does not have the same length 
as the path one way, in spite of the fact that both have the same range. To get 
the length of the round trip each point of the range must be counted twice. 


Let ¢:R™ — R® be continuous on the set EC R™. For each pointy € R’, 
let N(E; y) be the number (possibly + ©) of points x € E with g(x) = y. 
The area of the surface g is the number 


area g = fN(e; y) dom. (1) 


Consider more closely the case where ¢ is one toone. In this case V(E; y) 
is 1 ify € y(E) and is 0 otherwise; so N(E; y) is the characteristic function of the 
set g(E), and formula (1) gives 


areag = |g(E)|, ‘if g is one to one. (2) 
Furthermore, if 


v(A) = |e(A)|m, 


then under some fairly mild additional conditions » is an absolutely continuous 
regular Borel measure, and the results of Chapter 14 give 


area g = |9(E)|m = ip Dy dx. 
Thus, the problem is reduced to that of calculating the derivative 


Dov(a) = lim |o(BG; ))| Im 


cuualBGr ale 6) 


The limit on the right, which is important whether ¢ is one to one or not, 
is called the Jacobian of g at x and is written J,(x). The fundamental theorem 
is that if g is of class C' at a point a, then the Jacobian of ¢ at the point a is 
the same as the Jacobian of its differential. Thus, by Theorem 1.6, 


: Tyla) = Jagiay = V det dp(a)* do(a). 


(This shows the relation between the Jacobian and the Jacobi matrix and 
explains the name.) 


Remark 1 


Exercise 1 


Remark 2 


parametric surfaces—introductory remarks 377 


Combining the various formulas above and supposing first that ¢ is one to 
one, we get 


area gy = i N(E; y) dom = ie eG )ax (4) 


It is reasonable to suspect, and it turns out to be true under the mild additional 
conditions on ¢, that the integral on the right performs just the same counting 
process as does N(E; y). The formula is correct whether ¢ is one to one or not 
and is the basic formula for surface area. 


There is a theory of area that applies to functions ¢ that are simply continuous. 
Here we shall have to assume more than that, but the results will still have quite 
good generality. The continuous theory is extremely subtile and difficult. 
Formulas like (4) do not make sense in general, and even when they do they 
are not true. Take, for instance, a continuous increasing function g:R!'— R} 
on [0, 1] with g(0) = 0, ¢(1) = 1, and y’(x) = 0 a.e., for example, the Cantor 
function. It is clear that the length of y should be at least the length of the 
segment [0, 1], that is, at least 1. But the integral of J,(x) = ¢’(x) is 0, since 
ge’ (x) = Oa. 


In this last discussion we are looking at yg as a path on the line itself. It is a 
little more striking to look at the graph of ¢ as a path in the plane. Do this. 


In the early days it was thought that the theory of surface area was very much 
like the theory of length, and, in particular, that the area should be defined as 
the limit of the areas of approximating polyhedra. The approximating 
polyhedra are obtained as follows: Suppose that the parametric surface ¢ is 
defined on a polyhedron—to be definite, let us say on a rectangle Q. Cut Q 
into small simplexes such that the intersection of any two is empty or is a common 
face. A small piece of Q might look as shown in Figure 1. 


= 
oe 
ae 


Figure 1 


378 


15/ surface area 


Exercise 2 


5 


DEFINITION 
3.1 


Let o be one of the simplexes in this ‘“‘triangulation” of Q, and let it have 
vertices do, . . . ,@m. Each point x © o can be written uniquely in the form 


x= > t;a; where #t; > 0 and » t= 1. 
7=0 j=0 
Define 
B(x) = ) pola). 


Then (¢) is the simplex with vertices g(ao), . . . , ¢(am), and it is not hard to 
see that ® is a well-defined continuous functiononQ. #isaninscribed paramet- 
ric polyhedron that approximates the initial ¢ if the simplexes are smal] enough. 

It was thought initially that the area of should approximate the area of 9, 
but consider the simple example of a cylinder given by the function 

¢(s, t) = (coss, sins, £), OSs = 27,70 f= 1 

This just wraps the rectangle Q, 0 < s < 27,0 <¢ < 1, around into a cylinder 
whose base is the circle of radius 1 and whose height is 1. If we cut Q into 
simplexes (= triangles) where the base is parallel to the s axis and is long relative 
to the height, then the corresponding simplexes ®(¢) are nearly perpendicular 
to the cylinder. It suddenly appears very unreasonable that the area of ® is an 
approximation to the area of the cylinder. ‘The remarkable fact is the following: 


For every number a > 27 (= the area of the cylinder) and every number 
e > 0, there is a triangulation of the rectangle into simplexes of diameter <e 
such that for the corresponding inscribed polyhedron ® we have 

ljarea  — al <e. 


In other words, every number > the true area is the limit of the areas of approxi- 
mating inscribed polyhedra! 


Once this example was discovered, Lebesgue proposed to define the area 
to be the limit inferior of the areas of approximating inscribed polyhedra. This 
definition turned out to be satisfactory—but so difficult to work with that it 
took nearly 50 years to prove that it was satisfactory. As a matter of fact, any 
definition of area is hard to work with unless the function ¢ is pretty nice. 


THE JACOBIAN 


if g:R™— R*,m <1, then the Jacobian of ¢ at the point a is the number 
J,(a) = 9 BG ple 
x ae |B(a; Din 


at any point where the limit exists. 


THEOREM 
3.2 


Proof When T = dg(a) 
Is Nonsingular 


the Jacobian 379 


It is not required that ¢ be defined everywhere on the ball B(@;r). If ¢ is 
defined on a set E, and A is any set, then g(A) means simply ¢(A( £). For 
instance, if g is the identity function on an interval E in R}, then J,(@) is 1 if a 
is in the interior of E, 0 if a is outside the closure, and # if a is one of the end 
points. As a matter of fact, however, we are interested mainly in the case 
where a is an interior point of E, in which case ¢ is defined everywhere on 
B(a; r) ifr issmall enough. The purpose of the section is to prove the following 
basic result—which has already been established when m = n in Theorems 4.2 
and 4.3 of Chapter 14. 


If ¢:R™ — Rv ts of class C' at a, then 
Joa) = Jagia) = V det de(a)* dg(a). (1) 


The proof is rather long. We shall take up first the case where T = dg(a) 
is nonsingular (i.e., is one to one). This is really the hard case, but it will seem 
easier because we have some heavy artillery to bring to bear. 


In this case the range of T has dimension m, and we can choose the 
coordinates in R" so that it is spanned by the first m basis vectors. If 
y GR’, we write y = (y’, y’"), where y’ is the first m coordinates and y’’ 
is the last n — m, and also g(x) = (g’(x), ’’(x)), and soon. Itis easy to 
check that with this choice of coordinates we have 


(T)*T' =T*T and T”=0. (2) 


Moreover, we already know the theorem for g’ and T’ (Theorems 4.2 
and 4.3 of Chapter 14), so we have 


Jr = Jr = Jo'(a). (3) 
Therefore, it will suffice to prove that 


lim lo(B(a; aul = 1, (4) 
r->0 le’ (B(a; pie 
for this will give that J,(a2) = J,-(a). 
If P is the projection of R* on R defined by Py = y’, then by definition 
we have y’ = Peg. Since P clearly satisfies |Px — Py| < |x — 9], 
Theorem 1.3 gives 


le’(E)|m < |e(E)|n for any set E, 


and this proves half of (4)—that the limit inferior is >1. 
To prove the other half we shall use the same idea—that is, we shall 
show that ifr is small enough, then the function f that takes y’(x) to g(x) 


380 


15/ surface area 


satisfies [f(z) — f(w)| < (1 + )|z — w| on y’(B(@;r)). Once this is 
done it will follow again from Theorem 1.3 that 
\e(B@r))|n < A + )le'(B@ 1) Ins 


which will prove the other half of (4). The basis for this is Theorem 5.3 
of.Chapter 10. Ife > 0 is given, then we can find 6 > 0 such that 


lox) - eG) - Te -—y)| Sex —y| fx,» © Ba; 6). (5) 


By the choice of the coordinates, J’ is nonsingular, so there is a number 
m > 0 such that |7’x| > m|x|; hence, [by (5)], 


ly’(x) — o’(y)| = (m — Ox — 9, (6) 


and, of course, we take the precaution to pick e << m. This shows that 
yg’ is one to one on B(a; 4) and, therefore, that f is well defined. Again by 
the choice of the coordinates, 7’ = 0; so, [by (5)], 


le"@) —e’ON Sex —y| if x,» € BG; 4). (7) 
Combining (6) and (7) we get 


lo(x) — vy)? < le’ (x) — eG)? + le’ (x) — e” ODP 
Se re OP ey — 9 


Py la EE, 
<'@) — vo (1+ 25) 


which is equivalent to the required inequality for f. When m = 2,n = 3 
and a = g(a) = 0, the picture looks as shown in Figure 2. 


Figure 2 


Proof When T = dy(a) 
Is Singular 


the Jacobian 381 


me — : 


Figure 3 


In this case the range of T has dimension p < m, and the picture looks 
perhaps as shown in Figure 3. This time we choose the coordinates so 
that the range of 7 is spanned by the first p basis vectors, and use primes 
to denote the first coordinates and double primes to denote the last 
n—p. Ife > 0 is given, we choose 6 > 0 so that (5) holds. If M = 
|| 7|| ++ 1, this gives in particular that 


le%) —e)| < Mx —y| ifxy © B = BG; 8); (8) 
and it follows from Theorem 1.3 that 
if V © Band |Nl, = 0, then |p(V) |, = 0. (9) 
The next step is to show that if B(j; r) C B, then 
y(B(b; r)) — (6) C B’(0; Mr) X B’(0; er), (10) 


where the primes indicate the balls in the respective spaces. Formula 
(10) means that if |x — 6] <r, then 


le’ (x) — ¢'(b)| < Mr and le’ (x) — ¢!’(b)| < er. 


The first of these follows from (8) and the second from (7). Formula 
(10) will let us estimate |g(B(6; r))|,—but note that Fubini’s theorem 
does not help when m < n. 

It is a little easier to deal with cubes than balls, so we replace the ball 
B'(0; Mr) by the (larger) cube Q’(0; 2Mr) with center 0 and side length 
2Mr, and the ball B’’(0; er) by the cube Q’’(0; 2er). What we need to 


382 


15/ surface area 


LEMMA 
3.3 


4 


estimate, then, is [Q|", = »7,(Q)/ym for 
O= 0 (08s) 0 (0-7) mewit nn 2 Mr anditi— cer 


Note that t < s/2 since M > 1 and e < 4. 
Let / be the integer such that ( — 1)t <5 < /t, and divide Q’ in 
l? cubes Q; of side length s//. If Q, = Q; X Q”, then each Q, is con- 


tained in a cube of side length ¢, so 6(Qx) < Vn t; therefore, if 7 > Wn ie 
then 


EG) sseren apne 
Using the fact that / — 1 < s/t = M/e (hence / < (M + e)/e), and the 
fact that ¢ = 2e7, we get 
u(Q) < n™22™(M + «)Pe™—Pr™ for n > 2 Vn er. 


Combining this with formula (10) we get the following lemma. 


If B(b; r) C B = B(a; 8) andy > 2V na, then 
\o(B(b; 7)) | < cem-?|B (8; 7) Im, 
where the constant c depends only on n, m, p, and M. 


The crucial point here is that the constant does not depend one, 7, 7, and 
so on, but only on the quantities n, m, p, and M = ||T|| + 1 that are fixed 
from the very beginning. 

Now take any fixed p < 6 and any fixed ». Consider the family § of all 
balls B(6;7) such that 2 Ve <n and B(b;r) C Bia; p). This family 
covers B(a; p) in the sense of Vitali, so there is a disjoint sequence {B;} 
that covers B(a; p) except for a set N of measure 0. Taking account of 
the lemma and formula (9), we have 


le(B(a; )) |, < Dle(B) i, < ce"-7Z|Bilm = ce”-*| BQ; p)|m- 
Since this holds for each n, we get 
|o(BQ@; p))|m < ce”-?|B(a; p)|m- 


As this holds for every p, it follows that the limit superior in Definition 3.1 
is <ce"-?, and hence that the limit superior is 0 = J,(a). 


ABSOLUTE CONTINUITY 


The notion of absolute continuity that was defined initially for functions from 
R* to R* in Section 4 of the last chapter must be extended to the present setting. 


DEFINITION 
4.1 


Exercise 1 


Exercise 2 


DEFINITION 
4.2 


Exercise 3 


THEOREM 
4.3 


Proof 


Exercise 4 


THEOREM 
4.4 


absolute continuity 383 


A function p:R™ —+ R” is absolutely continuous on a set A if it is defined and 
continuous on A and if |p(N)|m = 0 whenever N C A and |N\m = 0. 


If ¢ is continuous on A = U;_, 4; and absolutely continuous on each 4;, then 
¢ is absolutely continuous on A. 


If y is locally absolutely continuous on A (i.e., if each point of A has a neighbor- 
hood G such that ¢ is absolutely continuous on 4G), then ¢ is absolutely 
continuous on A. (Hint: The neighborhood G can be taken to be a ball with 
rational center and radius.) 


A function g:R™ —> R* is Lipschitzian on a set A if there is a constant M 
such that |p(x) — o(y)| < M|x — y| for any two points x and y of A. 
It is locally Lipschitzian if each point has a neighborhood G such that it is 
Lipschitzian on ACY G. 


Consider g:X — Y, where X and Y are metric spaces, and show that if ¢ is 
locally Lipschitzian, then it is Lipschitzian on any compact subset of X. 


Tf e:R™— R" ts C! on A, then it is locally Lipschitzian on A. If it is 
locally Lipschitzian on A, then it is absolutely continuous on A. 


Theorem 5.3 of Chapter 10 shows that if g is C! on A, then it is locally 
Lipschitzian on A. Exercise 1 and Theorem 1.3 show that if ¢ is locally 
Lipschitzian, then it is absolutely continuous on A. 


The usual way to show that a continuous function ¢ is absolutely con- 
tinuous on a set A is to show that ¢ is C! on A — N, while |g(N)|n = 0 


Consider the function g(t) = (¢, ¢sin (1/é)), which is the natural parametric 
representation of the graph of x sin 1/x. Show that ¢ is absolutely continuous 
on R' but not locally Lipschitzian at 0. Show that the length of ¢ is infinite on 
any interval containing 0. (Use the old definition of length.) 


It is tempting to think that if g is absolutely continuous, then |y(K)|m << © 
for any compact set K. The example in the exercise shows that this is not 
correct, for in this case the basic theorem of Section 6 will show that |y¢(J)|: is 
the length of g on J. 


Let g:R"™— R* be absolutely continuous on a set Eo. If E a Eg is 
Lebesgue measurable, then o(E) is am measurable. If A C Eo is arbitrary, 
there is a Gs set E> A such that |p(E CO Eo)|m = |e(A)|mn- 


384 


15/ surface area 


Proof 


Exercise 5 


. 


DEFINITION 
5.1 


THEOREM 
5.2 


Proof 


If Eis Lebesgue measurable, then £ is the union of a sequence of compact 
sets and a set of measure 0. Then ¢(£) is also the union of a sequence of 
compact sets and a set of measure 0. Indeed, yg takes compacts into 
compacts because it is continuous, and sets of measure 0 into sets of 
measure 0 by the definition of absolute continuity. 

To prove the other half we shall need the following result on the 
Hausdorff measures. 


For any B C R" there is a G; set H > B with [A|, = |Bla. (Hint: The cover- 
ings that are used to define y*,(B) can be required to be open. This can be 
seen as follows: Let e > Oand p > 1 be given, and let A be any set with 6(A) < . 
Fix a positive 6 and let G = {x:d(x, A) < 5}. Then G is open and 4(G) < 
6(A) + 25. Consequently, if 5 is small enough so that 


5(A) + 26 <« and 5(A) + 26 < pd(A), 
then 6(G) < « and 6(G) < pé(A). If we replace each A; in the covering by a 
G;, of this kind, then at worst we multiply the sum by p.] 


To prove the other half of the theorem choose a G; set H > ¢(A) with 
[H|, = |p(A)|n. Let H = Hj, where Hy isopen. Then g~'(H;) is open 
in Ep because ¢ is continuous, so y7!(Hi) = G,\ Eo, where G; is open 


in R™. The set E = (\G; does the job, since p(EM Eo) C H. 


VARIATION 


The area of a parametric surface y can be expressed in terms of the variation 
of the function y. There are several ways to define the variation. The one we 
shall use was discovered by the Polish mathematician Stefan Banach. 


The (Banach) variation of a function p:R™— R" on a set EC R™ is the 


number 

V(~; E) = sup Z|e(E})|m, (1) 
the upper bound being taken over all disjoint sequences {E;} of measurable 
subsets of E. 


If p:R™ — R® is absolutely continuous on the measurable set E, then 
‘ Vy; E) = area gp = JN(E; y) dam. 


Recall that N(E; y) is the number of points x € E with g(x) = y and that 
the second equality is just the definition of area that was given in Section 2. 


variation 385 


It is obvious that if {£;} is any disjoint sequence of measurable subsets of 
E, then 


N(E;y) 2 ZN(Esy) and J N(E5y) dam 2 |v(Es)|n- 
Since any nonnegative series can be integrated term by term, this gives 


SN(E;y) dam > Dle(E)\m; 


hence 
area g > V(g; E). 


(There is one minor point left to be shown—that the multiplicity function 
N(E; y) is am measurable. This will be taken care of in the second half 
of the proof.] 

In the second half of the proof we shall make use of a construction 
that will be needed again in Section 6. For each positive integer k, let 
5, be a disjoint sequence of measurable subsets of E of diameter <1/k 
that covers almost all of E. (Here 5, can be any such sequence, but in 
Section 6 it will be a special one.) With the family 5 we can form 
another multiplicity function N,.(£; y) that approximates the initial one 
by letting V.(Z; y) be the number of sets F € $, such that y € g(F). It 
is easy to see that 


Ni (E; y) < N(E; y) and Ni (E; y) > N(E; y) a.e. am. (2) 


The first part is self-evident. As for the second part, if §;, covers all of E 
except for a set M, of measure 0 and M = UM,, then the convergence 
takes place except on the set g(M4) which has a,, measure 0 by the absolute 
continuity. 

There is another way to look at N,z(E; y). If cr denotes the char- 
acteristic function of the set ¢(F), then 


Ni (E; y) = » cr(y), (3) 


FES: 


for there is a 1 in the sum on the right each time the point y belongs to 
the set g(F). This formula shows that the N;.(E; y) are am measurable, 
and formula (2) shows that M(E;y) is a, measurable. Indeed, F is 
Lebesgue measurable; so by Theorem 4.4, g(F) is am measurable. 

The conditions in formula (2) guarantee that we can integrate term 
by term. Indeed, if M(E; y) is integrable, then the dominated convergence 
theorem does the job. If it is not, then Fatou’s lemma gives 


o = [N(E;y) dam < lim inf fN;i(E; y) dom. 


386 


15/ surface area 


Exercise 1 


Exercise 2 


6 


THEOREM 
6.1 


Proof 


Integrating term by term in (2) and then using (3) we get 


J NE5y) deim = Tim f MA(B59) dam = lim Y [oP 


mes FES: 


Each sum on the right is at most V(y; E), so we have 


JN(E; y) dam < V(9; E), 


and the theorem is proved. 


A function ¢:R!— R! on a closed interval J = [a, 6] has finite variation on J 
if and only if it is the difference of two increasing functions. ([Hint: For one of 
the two take the function v(x) = V(¢; [a, x]).] 


Let g¢:R! — R' be absolutely continuous and have finite variation on an interval 
I = [a, 6]. Show that for each e > 0 there is a 6 > O such that if EC J and 
|E| < 6, then V(y; E) <. (Hint: Use Exercise 1 and then Exercise 6, Section 3 
of Chapter 14. The property given here is the classical property of absolute 
continuity.) 


THE JACOBIAN FORMULA FOR SURFACE AREA 


Now we are ready for the main theorem. 


If e:R™ > R*, m <n, ts absolutely continuous on the measurable set E 
and of class C' at almost every point, then 


Hines i! NCE idee = ie Jgeawek: (1) 


The first equality is the definition of the area, so it is the second one that 
has to be proved. Note first that if E is replaced by any subset D such 
that |E — D|, = 0, then both sides of (1) remain unchanged. It is 
obvious that the right side remains unchanged. As for the left side, we 
have N(D; y) = N(E;y) at each point y not in the set g(E — D), which 
has a, measure 0 by the absolute continuity. Thus, N(D; y) = N(E; y) 
a.€. Q@m, so the left side remains unchanged too. Now we shall replace E 
by the subset on which ¢ is of class C! and suppose from now on that ¢ is of 
class C! at each point of £. In this case Theorem 5.3 of Chapter 10 shows 
that each point of £ is the center of a ball on which ¢ is differentiable and 
Lipschitzian. If 2 is the union of these balls, then 2 is an open set con- 
taining £, and g is differentiable at each point of 2 and is locally Lipschit- 
zian on Q, 


the Jacobian formula for surface area 387 


Now we shall prove the theorem when ¢ is one to one on 2. (Ina 
moment we shall apply this result to various subsets of 2 on which ¢ is 
actually one to one.) If gy is one to one on Q, then the measure 


(E) = |o(E) Im 


is an absolutely continuous regular Borel measure on 2. The fact that ¢ 
is locally Lipschitzian (hence Lipschitzian on compact sets) gives that 
compact sets have finite measure and that v is absolutely continuous. 
Theorem 4.4 shows that each set is contained in a G; of the same measure. 
Theorem 3.4 of the last chapter gives the formula 


lo(E) lm = [,, Dv dx, 


and Theorem 3.2 shows that Dv(x) = J,(x) at each pointx € E. This 
proves the theorem when ¢ is one to one on 2. 

Next we shall show that if A¢ = {x € E:J,(x) = 0}, then 
\o(M)|n = 0, which will prove the theorem for the set M, for it will show 
that N(M;y) = 0 a.e. am. In doing this we can assume that M is 
bounded, for if the assertion is true for each bounded part of M, then it is 
true for M itself. Choose an open G with MC GC © and [G|, < ~, 
and let e > 0 be given. Let § be the family of balls that are contained 
in G and satisfy |p(B)|m < ¢[B|n. This family covers M in the sense of 
Vitali; so there is a disjoint sequence {B;} in § that covers almost all of 47, 
and we have 


Ip(AD)|m < Zle(Bi)|m < Z|Bylm < €lGln. 


Since € > 0 is arbitrary, it follows that |p(A2)|_ = 0. 

What we have just done shows that V(M/; y) = 0a.e.am. Therefore, 
neither side of formula (1) changes if we replace the set E by the set 
E — M, and we can assume from now on that J,(x) # 0 for each point 
xEeE. 

Fix a positive integer & and consider the family of balls B such that 
BC Q, 6(B) < 1/k, and gis one toone on B. Since J,(x) ~ 0 for each 
x € E, this family covers E in the sense of Vitali, and we can choose a 
disjoint countable subfamily 5, that covers almost all of ZE. We shall 
apply the results of the last section to the family 5, of sets of the form 
BOVE, where BE §,. What has been done in the one-to-one case 
(applied to 2 = B) shows that for each F € 5; we have 


i Pay eames 0 39) iE Je(x) dx, 


Summing on the F € 5; and using formula (3) of the last section, we get 


[ Ne(Bs 9) dom = I Je(x) dx, 


388 


15/ surface area 


THEOREM 
6.2 


Proof 


Letting k— © and using formula (2) of the last section, we get the 
required formula (1). [It is not really necessary to let k—> © here. 
The fact that g is one to one on each F € Sy implies that M(E; y) = 
Ni (E; y) a.€. am-] 


The same argument leads to a Jacobian formula for surface integrals. 


Note that when m = n and gis one to one it is just the old formula for changing 
variables. 


If e:R™—» R*, m < a, is absolutely continuous on the measurable set E 
and of class C' at almost every point, then for every nonnegative om measurable 
function f, 


[ione; y) dam = [pe fewe ae 


In the same way as before we discard the points where ¢ is not C! and 
then find an open set 2  E on which g is locally Lipschitzian. Also in 
the same way as before we discard the points where J, = 0 and then 
assume that J, ~ 0 everywhere on EZ. Again this leads us to treat first 
the case where ¢ is one to one on &, and it is only in this part that there 
is a slight addition to the previous argument. 

We have seen that the measure »(£) = |y(Z)|, is an absolutely 
continuous regular Borel measure on 2, so by Theorem 3.4 of Chapter 14 
we have 


fg dy = fg Dy dx 


for every nonnegative vy measurable function g. What we have to establish 
is that if f is nonnegative and a,, measurable, then g = f o yis vy measurable 
and 


ff dam = We og dy. (2) 


When we apply this formula and the last one to the function f(y) N(E; y) 
we get exactly the theorem, for N(E; y) is the characteristic function of 
the set g(£) and Dv = J, on E. 

It is evident that if the set y(£) is a, measurable, then the set E is pv 
measurable. ‘This shows that if f is a,, measurable, then f © ¢ is »y measur- 
able. To get formula (2) take first the case where f is the characteristic 
function of an a, measurable set F. Then fo ¢ is the characteristic func- 
tion of the set E = g~!(F), and formula (2) is just the definition of ». 
The general case results at once by taking first linear combinations of 
characteristic functions to get all simple functions, and then increasing 
limits of simple functions. 


THEOREM 
6.3 


Exercise 1 


Exercise 2 


Exercise 3 


7 


examples 389 


Now the proof is finished as before. What we have just proved shows 
that for each set F in the family $, we have 


J Vip thy, I fociedn 


and summation over the F € §;, gives 


[ 10.59) dan = [fe ee ax. 


Let us review Sard’s theorem briefly now that we have plenty of 
information about surface area. Let N and M be smooth manifolds of 
dimensions n and m, and let f:N— M be ofclassC!onN. The differential 
df(a) at a point a € N isa linear transformation from the tangent space 
T.(N) into the tangent space 7,(M), 6 = f(a). The point a is a regular 
point if df(a) has the maximum possible rank, which means that df(a) is 
one to one if n < mand that df(a) is onto ifn > m. It isa critical point 
otherwise. The point y € f(N) is a regular value of f if every point of the 
set 

Ny = {x E N:f(x) = 9} 


is a regular point. 


(Sard’s Theorem) If f:N—M is of sufficiently high class C’, then 
almost every value of f is a regular value, where almost every refers of course to 
the area measure a on M. 


Prove the theorem. (Hint: This general case reduces immediately to the original 
one by taking local parametric representations on N and M. All you have to 
know is the following easy exercise.) 


Let E be a subset of the smooth m-dimensional manifold M. The following 
are equivalent: 

@) lz — 0. 

(b) For every local parametric representation g, |g—(E)|m = 0. 

(c) For each point a € M there is some local parametric representation 
at a such that |y—'(E)|, = 0. 


We have mentioned that Sard’s theorem is interesting only when n > m. If 
n < m, then Theorem 6.1 gives a much better result. 


EXAMPLES 


Consider the length of a path g:R!~— R* on an interval [a, 6]. The Jacobi 
matrix has the coordinates of g’ arranged in a column. Its adjoint has the 


399 


15/surface area 


Remark 


coordinates of gy’ arranged ina row. The product has just the single entry |¢’[?. 
Hence J, = |y’| and the result of Theorem 6.1 is that 


b 
arc length = i ly’ (t)| at, (1) 
provided, of course, that ¢ is absolutely continuous and of class C! a.e. on [a, 5]. 


If y has finite variation on [a, 6], then so does each coordinate function. Hence 
each coordinate function is the difference of two increasing functions. There- 
fore, each coordinate function, and, consequently, ¢ itself, is differentiable a.e. 
We have seen (Cantor function) that the arc length is not always given by 
formula (1). It is given by formula (1) if g is absolutely continuous as well as 
of finite variation, i.e., if ¢ is absolutely continuous in the classical sense. In 
the case of paths the hypothesis that ¢ is of class C’ a.e. can be dropped. 
As a second example, consider a surface given by an equation 


I= FO), y CECR (2) 
The parametric surface is the function g:R*~! > R® given by 
g(x) = (x, F(x)), x€G EC R™, (3) 


At a point x where F is of class C! the Jacobi matrix is 


1 0 0 
0 1 0 
oe a Gene eee oe 
ax 0 0 1 
OF OF OF 
Ox i Ox 2 Ox n~l 


The product (d¢/0x)*(9¢/dx) is the matrix {a,;;} with 


es oF \? (=) (=) fi x5 
a= rae SS |S aa lI 2 o 
i Oxi}? - Oxi} \Ox; J 


This turns out to be a case where we can calculate the Jacobian very nicely by 
finding some eigenvalues. To shorten the notation let a; = 0F/dx;, and let 
a =(a,...,4n-1). Let T:R*~!— R*™' be the linear transformation defined 
by 

; Tx =x + (x, a)a. 
We have (Te, ¢;) = (¢:, ¢;) + a:a;, so the matrix of T is exactly the one above, 
that is, T = dp*dy. 


’ 


Exercise 1 


Exercise 2 


Exercise 3 


Exercise 4 


examples 391 


The eigenvalues of T are almost in plain sight. Ifx 1 a, then Tx = x, sox 
is an eigenvector with eigenvalue 1, while Ta = (1+ |a]?)a; so a itself is an 
eigenvector with eigenvalue 1 + |a|?. Thus, the product of the eigenvalues is 
1 + |a|?, and we have 


Jo = V1 fal? = V1 + (VF. (4) 


Is everything all right if VF = 0? Note that in this case a is not an eigenvector. 
Theorem 6.1 gives 
area g = ik V1 + |VFF dx (5) 
provided, of course, that ¢ is absolutely continuous and of class C’ at almost 
every point. 
Consider the unit sphere in R*, the top half of which has the equation 
Je Sa pees 

In this case F(x) = V1 — |x|? on |x| < 1, 

oF 

= —ai(1 = |x|?) 

Ox; 
and 

2 
NEVE = al 
1 — |x|? 
sO 
1 
Jf | 
x ee 

Consequently, 


[S]n-1 = 2 (ies, (6) 


Ba-k 


where Sis the unit sphere in R", and B"-is the unit ballin R"“'. (Usually the 
unit sphere in R* is written S"-! to indicate that it is a sphere and that its 
dimension as a surface is n — 1.) 


Calculate the integral in (6) by using the formulas in Section 9 of Chapter 13 
and the formulas for the gamma function. 


Strictly speaking, the integral in (6) gives the area of the sphere with the section 
through x, = 0 removed. Show that the area of this section is 0. 


Show that formula (4) holds by calculating determinants instead of eigenvalues. 


392 


15/ surface area 


8 


THEOREM 
8.1 


Exercise 1 


Proof 


POLAR COORDINATES 


Each point x # 0 in R* can be expressed uniquely as x = 76, wherer is a positive 
real number and @ is a point on the unit sphere S*“!._ The pair (7, 0) EC RL X 
S"~1 is called the polar coordinates ofx. (Rj, is the set of positive real numbers.) 
This representation suggests the possibility of expressing an integral over R” as 
an integral with respect to the product measure dr dan_;._ Such an expression 
is particularly convenient when the integrand has some spherical symmetry. 


If f:R" — R! is nonnegative and measurable, then 
[ £0) ¢y = Joes fo FOOT! dr dagt. (1) 


In other words, y should be replaced by 7@ and dy by r*~! dr da,_;._ Note that 
while a,-1 is not g-finite on R’, its restriction to S$"! is finite; so Fubini’s theorem 
is applicable here and shows that the integral on the right can be written in 
either order. 


Use the results of the last section to show that if EC S$*~!, then a,_\(rE) = 
r™—1q,_,(E£), and then provide a heuristic argument for the fact that dy should be 
replaced by r*~! dr dan-1. 


In order to get a real proof (as opposed to the heuristic one of the exercise), 
we Shall first use Theorem 6.2 to transfer the surface integral on the right 
to anintegralon R™"!. Then the whole integral on the right will become 
an integral on R* in which we can change variables by Theorem 4.5 of 
Chapter 14. In order to transfer the surface integral to R*—'!, we shall 
want to use just the top half of the sphere, S}-’ = {0:4 € S"—!, 6, > 0}, 
rather than the whole thing. The reason is that Sf’ is the surface 
g:R"—!— R* given by 


o(x) = (x, V1 — |x/2), x © Br} 


which we have studied already in Section 7. Then the formula we shall 
prove is that 


: LO) = fe. fo 08) dr denna. (2) 


The same formula holds for the bottom half, and the two together give (1). 
The function ¢ is clearly one to one and of class C! everywhere on 


polar coordinates 393 


B*—!, so Theorem 6.2 gives 


iL oe iM F(r0)1"7} dr dotn—1 
= f a iP flrx, 1 V1 — |x|?) OU, dr dx. (3) 
From the last section we have 
I= rT (4) 
Now, in the integral fp,» f(y) dy, make the change of variable y = 
W(x, 7), with 
vn) = xr V1 — [xe), x © BO, O< 1 < o. 
According to Theorem 4.5 of Chapter 14, we have 


fa..f0) os tes i f(rx,r V1 — [x|*) Jyp(x, 1) dr dx (5) 


Therefore, what remains is to show that 
rl 

ramet (6) 

V1 = |x/? 

Let M4 be the Jacobi matrix, and to shorten the notation let D = 


V1— |x|? Then 


Jy (x, r) a 


r 0 0 Xx) 
0 if 0 Xe 
M = 0 0 r Coney 
1X) 1X9 TXn-1 
—_— = = D 
D D D 


If you know something about determinants, you can establish formula (6) 
as follows: Multiply the first row by x1/D and add to the last row. Then 
multiply the second row by x2/D and add to the last row, andsoon, The 
result is a matrix M’ with the same determinant, and M7’ looks like this: 


an OOM GT 

0 r O xe 
M=1|0 0 1 xal, 

00 0 ! 

D 


for which the determinant is clearly r*—!/D, as required. 


394 


15/surface area 


Exercise 2 


However, the Jacobian can be obtained by the same trick as in Section 7 
and without any recourse to determinants, for M@*M = {a,;}, where 


2 


XX eras é 
ag = 185 + ie andy << 7 
Qin = Ay = O it 2, 
Gan = i, 
re 1 ifz = J, 
7 10 NED SS ah 
If we set b = (m:/D, . . . , Xn-1/D, 0), and define Ty = y + (y, b)b, and S by 


Sex = re; ift <n and Se, = e,, then it is immediately checked that /*M is the 
matrix of ST. Now, Js = 7r%"—», since S has n — 1 eigenvectors with eigen- 
value r? and one with eigenvalue 1; and 
[xP 1 
eG SN te re ae 
since T (as in Section 7) has one eigenvector, which is 6, with eigenvalue 
1 + |6|? and 2 — 1 with eigenvalue 1. Thus, we get 


pn h\2 

I} = Jue = Sole = (SY. 

This establishes formula (6) and, therefore, Theorem 8.1—except for one point 

that we have let slip by, and shall continue to let slip by unless you want to try 
the following exercise: 


The function f on R* is measurable if and only if the function g(r, 0) = f(r@) is 
measurable on Ri x S"~! with respect to the measure dr day_. 


Ordinarily, we shall write d6 for the restriction of a,-, to S*~!. In this 
case the formula for integration in polar coordinates becomes 


ef) & = fo, fy rope? ar 0. (7) 


In the casez = 2 there is some ambiguity in the usual notations, which may 
be perplexing at first, but should not cause real confusion. As a curve (i.e., 
one-dimensional surface), 5S! is the function g:R!— R? given by 


g(t) = (cos é, sin 2), 0 =< 7% < Zr, 


As we have seen in Section 7, J, = |y’| = 1. Therefore, when 6 € S! and 
t € (0, 2m) are related by 6 = ¢(#), we have d@ = dt. Often 6 and ¢ are simply 
identified, and the polar coordinates of a point in the plane are considered 
interchangeably as a pair (r, 0) © Ri. X S', or as a pair (r, 0) (r, 0) E RX 


Exercise 3 


polar coordinates 395 


[0, 27). Since d@ = dt, this confusion does not cause difficulty. The polar 
coordinates described in Section 9 of Chapter 13 are the pair (7, 0) E Ri X 
[0, 27). 


If the volume of the unit ball in R" is calculated by Theorem 8.1, the result is 
immediately seen to be 


[Stalwart 


|B], = (8) 
Along with formula (14), Section 9 of Chapter 13, this gives 
Qrnl2 
[Se ],—2 = (9) 


T(n/2). 


16 : The Brouwer Degree 


l INTRODUCTION 


The degree of a continuous function {:5"-! > §*~! is an integer that measures 
the number of times the parametric surface f “wraps around” the unit sphere 
S71 The way to define this integer and the fact that it reflects fundamental 
properties of the function f itself were discovered about 1900 by the Dutch 
mathematician L. E. J. Brouwer. 

The first idea for getting the number of times that f wraps around S?~! 
would be simply to pick a point y and count the number of points x with 
f(x) = y. This would give just the old multiplicity function N(S"-'; y) of 
Chapter 15. Consider the function f: 5! — S! representing the path that starts 
at the top of the circle and goes counterclockwise to the bottom, then returns 


L. E. J Brouwer 


Exercise 1 


Exercise 2 


introduction 397 


Figure 1 


clockwise to the top, and finally goes counterclockwise all the way around 
(Figure 1). Ify is on the right half of the circle, then N(S'; y) is 1; ify is on the 
left half, then N(S'; y) is 3, while at the top and bottom it is 2. On the other 
hand, it is clear that the path wraps around once. This suggests that we 
cannot simply count the number of points x with f(x) = y, but rather that a 
given point x should be counted +-1 times if we are going in a counterclockwise 
direction at that point, and —1 times if we are going in a clockwise direction. 
If the counting is done this way and the result is called deg(f; y), then we have 
deg(f; y) = 1 at every point y of the circle, except the two points at the top and 
bottom where we do not know quite what to think. 


Draw several more paths f:S!— S! and notice that the function deg(/; y) takes 
the same value at all points y except a few where you do not know what to 
think. This constant value is the number deg f that we are looking for. 


Let f:S!— S! be of class C!. Try to find an analytic way to express the fact 
that at a point x € S! the motion is counterclockwise. 


In Section 2 we shall carry out the details of this construction for functions 
f:S"-!—> $*—' that are highly differentiable. In the following section we shall 
extend the results to continuous functions by approximating them with C” 
functions. In the case of an arbitrary continuous function there is no way to do 
a direct counting procedure. It may well be, for example, that for each point y 
there are infinitely many points x with f(x) = y. 


398 


16/the Brouwer degree 


2 


Exercise 1 


Exercise 2 
Exercise 3 


Exercise 4 


Exercise 5 


THE DEGREE FOR C° FUNCTIONS 


This section contains the construction of the degree for functions f:5"~! > §*—! 
of class C*. [The construction is the same for functions of class C*™ with r > 
n(n + 1)/2, but we can obtain the results for continuous functions by using C” 
approximations, so there is no point in worrying about class C’.] 

The number deg(f; y) is defined as follows: First set 


}®) = iis(5) for x ¥ 0; (1) 
then define 


sign det df(x) if y is a regular value of f 
deg(f; 9) = \ raven (2) 


0 otherwise. 


The number sign det df(x) is (by definition) 1 if det df(x) > 0 and —1 if 
det df(x) < 0. This is the expression that tells whether to count a given point x 
plus 1 times or minus 1 times. 


Show that sign det df(x) answers the question raised in Exercise 2 of the last 
section. 


Note that if y is a regular value of f, then the set {x:f(x) = y} is a compact 
smooth manifold of dimension 0, hence a finite set of points; so the sum in (2) 
is just a finite sum of plus and minus 1’s. 


Show that f is of class C’ on R* — {0} and that for each y € S*~" the sets 
{x; f(x) = y} and {x:f(x) = y} are the same. 


Show that a point y € S*-! is a regular value of f if and only if it is a regular 
value of f. 


If K is the set of critical points of f and L = f(K) is the set of critical values, then 
Lis acompactset of an_1 measure 0, and deg(f; y) is locally constant on SoS) ile 
that is, each point of §*-! — L has a neighborhood G in R* that does not meet il 
and such that deg(f; y) is constant on G() S"™7). 


The main fact to be proved is that deg(f; y) is constant on S"~! — L, not 
just locally constant. Since L is small (has a1 measure 0), what pops into 


mind is a connectedness argument—but S"~! — L need not be connected. 


Give an example. 


the degree for C” functions 399 


The proof of the fact that deg(f; y) is constant on S"~! — L is based on the 
following theorem, which also contains other important information. 


THEOREM Let F:S' X S™-1—> S"-! be of class C®, and let 
2.1 


f(x) = F(to, x) and g(x) = F(t, x). 


If y € S*-\isnot a critical value of either f or g, thendeg(f; y) = deg(g; y). 


[Think of the theorem as saying that if f can be “deformed” into g—the 
functions f(x) = F(t, x) effecting the deformation—then deg(f; y) = deg(g; y).] 


Proof In the first place it can be assumed that y is not a critical value of any of 
the three f, g, or F. Indeed, by Exercise 4 there is a neighborhood G of y 
in R" containing no critical value of f and no critical value of g and such 
that deg(f; z) and deg(g; z) are both constanton G( S*"!. The set L of 
critical values of F has an—1 measure 0 by Sard’s theorem, so in particular 
it cannot fill up the whole set G/\ S""!. If we choose a point z€ 
G — L, and if we know the theorem for such z, then we have deg(f; y) = 
deg(f; z) = deg(g; z) = deg(g; y). Henceforth, we assume that y is not 
a critical value of f, g, or F. Furthermore, we assume that y is actually a 
value of F; for if it is not, then it is not a value of f, nor of g, in which case 
the definition gives deg(f; y) = 0 = deg(g;y). Thus, y is a regular value 
of FP. 

Since y is a regular value of F, the set 


M, = {(¢,x):F(é,x) = y} 


is a compact smooth manifold of dimension 1. It is plain that 


oF 
deg(f; y) = sign det —> 
Ox 

(to, 2) EMy 
oF 
deg(g; y) = sign det — 
Ox 

(4,2)EMy 


where 


Gs) = | F(: ~) x #0. 


” |x| 
What we shall show is that if M is any component of M,, then 


oF oF 
sign det — = y sign det — (3) 
Ox Ox 


(toc) EM (t1,.2) EM 


400 


16/the Brouwer degree 


[é;] x g7-1 


[to] X g7-! 


Figure 2 


This will prove the theorem because the compact smooth manifold M, is 
the disjoint union of its components and the components are finite in 
number. 


The advantage in dealing with the component M is that it is a compact 
connected smooth one-dimensional manifold, so it has a parametric repre- 
sentation by arc length. That is, there is a C® function g:R!— M such 
that g(R!) = M, o(s + a) = ¢(s) for all s, |y’| = 1, and gis one to one on 
the interval [0, a). (Theorem 3.14 of Chapter 11.) In order to make use 
of y we shall have to find its connection with the numbers sign det OF /dx. 
When n = 2, the picture looks as shown in Figure 2. 

In doing the calculations we shall be just a little bit illogical and shall 
identify the circle S! with the interval (0, 27] so that we can speak of dF'/dt, 
of the idea that t; > to, and soon. To be perfectly correct about this we 
should put ¢(r) = (cosr, sinr) and replace F(t, x) by F(t(r), x), but this 
complicated notation introduces more confusion than it eliminates. If S! 
is identified with (0, 27], then automatically S' X S$"~' is identified with a 
subset of R™+!. We shall use the subscript 0 to indicate the first coordinate. 
With this notation the connection between ¢ and sign det F/dx is that 


oF 
sign ¢4(s) = sign det ae (4) 
Be 


the right side being calculated at the point ¢(s). 

To prove formula (4) we calculate the tangent vector to M at g(s) in two 
different ways. On the one hand, the tangent vector is just g’(s). On the 
other, since M is defined by the equation F(é, x) = y, the normal is spanned 


the degree for C™ functions gor 


by the vectors VF;, j = 1, ,n. What we need is a vector orthogonal to 
each of these. Note that ire Te matrix of F is then X n + 1 matrix 


VF, 


Let A; be the square matrix obtained by skipping the jth column (starting 
with j = 0), and let a; = (—1)’det A;. At each point of M the vector a is #0 
because each point of M is a regular point of F. Furthermore, a is orthogonal 
to each VF;, as can be seen by putting VF; at the bottom of the Jacobi matrix 
and calculating the resulting n + 1X n-+ 1 determinant by Cramer’s rule. 
The determinant is 0 because two rows of the matrix are the same. It is also 
the inner product of a with VF; by Cramer’srule. Since the tangent space to M 
is one dimensional, we conclude that 


a(v(s)) = e(s)e'), — os) #0. (5) 


Thus, p is either always positive or always negative, and, replacing g(s) by 
v(—s) if necessary, we can suppose that p is always positive. In this case sign 
ao(v(s)) = sign go(s), while obviously ao = det oF /dx. This proves formula 
(4), and now we will use formula (4) to prove the theorem. 

If the sections {to} X S*~! and {t,} X S*-! are removed from S! X S$"), 
then what is left consists of two connected components. With the identification 
of S$! with [0, 27] and to < t, they are 


C= it at eri 
i Ny —atgor? © aie 


D is connected because we are really on the circle S! rather than the interval 
[0, 2r]. The points s with go(s) = fo are the ones that contribute to deg(/; y), 
and the points s with go(s) = t; are the ones that contribute to deg(g;y). At 
each such point we have ¢,(s) # 0 because of formula (5) and the regularity 
of f and g. For the same reason there are only a finite number of these points 
in the interval [0, a). Consider two adjacent ones, so and s; with so < 51. 
Since e((s0, s1)) is connected, it must be contained in either C or D. For 
definiteness let us suppose that it is contained in C and that o(s0) =i. For 
s > 59 we have (s) & C; hence go(s) > to; therefore, (50) > 0. [Remember 
that (so) * 0!] Now there are two cases to consider. 


$02 16/the Brouwer degree 


Exercise 6 


Exercise 7 


THEOREM 
2.2 


Proof 


Exercise 8 


DEFINITION 
2.3 


THEOREM 
2.4 


Exercise 9 


Case a. Suppose that go(s1) = to. In this case ¢o(s1) < 0; for if s < 51, 
then y(s) € C so that go(s) > to. Thus, the contribution of the two terms 


sign ¢o(s0) + sign ¢o(s1) 


to deg(f; y) is 0. 

Case b. Suppose that g9(s1) = ti. In this case go(s:) > 0; for if s < 51, 
then ¢(s) € C so that go(s) < t;. Consequently, the two points s9 and s; 
together contribute +1 to deg(f; y) and +1 to deg(g; y). 

Now the theorem is proved, for it has been shown that any pair of adjacent 
points so and s; with go(s) = ¢é or f; contribute the same thing to both 


deg(f; y) and deg(g; y). 


To pair up the points in this way it is necessary to know that the number of 
points s € [0, «), with go(s) = to or 4, is even. Why is this true? 


In what situations would the contribution be —1 to both degrees? 
If neither y nor z is a critical value of f, then deg(f; y) = deg(f; z). 


Let U; be an orthogonal transformation on R®* that effects a rotation 
through an angle? in the plane spanned by y and z and leaves the orthogonal 
complement of this plane alone, and set F(t, x) = U;f(x). 


Write a formula for U; to show that F is of class C”. 
Choose f: so that U,,z = y, and set g(x) = F(t, x). Apply Theorem 2.1 tog 
and to f = Uof. Itisimmediately checked that {x:f(x) = z} = {x:g(x) =y} 
and that deg(/; z) = deg(g; y). Theorem 2.1 gives deg(g; y) = deg(f; y). 


If f:S"—) — S*—" ts of class C®, then deg f is the common value of deg(f; y) 
on the set of points y that are not critical values of f. 


If Ss So—1 2s S$»-1, then deg(g of) = deg g deg f. 
Prove the theorem. 


This construction of Brouwer’s degree comes from an elegant little book of 
John Milnor called Topology from the Differential Point of View. 


the degree for continuous functions 403 


John Milnor 


3 THE DEGREE FOR CONTINUOUS FUNCTIONS 


The degree of a continuous function f:S"~! + S*~! can be obtained by approxi- 
mation in the norm 
llzl| = sup |u(x)}. (1) 
rEsr 


The basis for this is the following lemma. 


LEMMA If f, g:S*—! > S*— are of class C® and ||f — g\| < 2, thendegf = deg g. 
3.1 
Proof If ¢ = (t:, t2) is a point of S', set 


G(t, x) = Gf) + be(x); 


then set F(t, x) = G(t, x)/|G(t, x)|. Note that G(t, x) is never 0. If it 
were, then we would have ¢{f(x) = —#g(x), and taking absolute values 
we would get #2 = #2; hence f(x) = —g(x), which isimpossibleif || f — gl| < 
2. Since G is of class C* and never 0, it follows that F is of class C* and 
we can apply Theorem 2.1. 


DEFINITION If f:S"-! + S") is continuous, then deg f = deg fo, where fo: S*-' > S*-* 
3.2 is any C® function with ||f — fol| < 1. 


If fo and goare any two C” functionssatisfying ||f — fol] < 1 and ||f — goll < 
1, then || fo — goll < 2; the lemma shows that deg fo = deg go, so the definition 
does not depend on the particular C® function fo. In order to show that there 
exist C* functions satisfying || f — fol| < 1, we use the Stone—-Weierstrass approxi- 


494 


16/the Brouwer degree 


THEOREM 
3.3 


Proof 


DEFINITION 
3.4 


THEOREM 
323 


Proof 


mation theorem as follows: Given e > 0 we use Stone—Weierstrass to find a C® 
function go:5"~! — R* such that ||f — go|| < ¢, and we set fo(x) = go(x)/|go(x)|. 
Since |f(x)| = 1, we have 


[1 — fgo@)Il < If — goll <«. 


If ¢ < 1, it follows that go(x) ¥ 0 (so the definition of fp makes sense), and also 
that 


1 
fo(x) — go(x) = ar = :) £0(x), 


which gives 


[folx) — go(x)| = [1 — lgo()I]. 
Hence, || fo — gol| < ¢, and finally 


If — foll < 2c. (2) 


This shows that a continuous function from S"—! to $"~! can be approximated 
arbitrarily well by a C® function from $"—! to S*~! in the norm (1). 


If f, g:S"—! — S"— are continuous and ||f ~ g|| < 2, then deg f = deg g. 


Take € < 2 — ||f — g|| and use (2) to find fy and go of class C* so that 
lf — foll < ¢/2 and |lg — gol] < «/2. Then ||fo — go|| < 2, and Lemma 
3.1 gives 


deg f = deg fo = deg go = deg g. 


Let f, g:S*—!— S*~! be continuous. We write f ~ g and say that f is 
homotopic to g if there is a continuous function F:[to, i] X S*7!1— Sr 
with 

f(x) = F(to, x) and g(x) = Fh, 2). 


(This is just our old notion of a deformation, but we no longer have to stay 
within the framework of C® functions and manifolds.) 


Let f, g:S"-1—> S*—! be continuous. If f ~ g then deg f = deg g. 


Set f(x) = F(¢, x). Since [¢o, i] X S"7! is compact, F is uniformly con- 
tinuous. From this it follows that there isa 6 > 0 such that if |f — s| < 4, 
then || fe — f.l| <2. This fact and Theorem 3.3 give that the function 
d(t) = deg f, is continuous on the interval [f, iJ. Since it takes only 
integer values, it must be constant. 


THEOREM 
3.6 


Proof 


Exercise 1 


Exercise 2 


Exercise 3 


the degree for continuous functions $05 
s 
If Se! — Sm! +, $1, then deg(g of) = deg g deg f. 


Choose fo and go of class C* and approximating f and g. Then we have 
lef) — golfole))| < le(f@) — gol F(2))| + leo(f@)) — gol fol))]- 


Now, £0 is of class C®, so it is Lipschitzian; if we write M for the Lipschitz 
constant, then we have 


llgef — goefoll < llg — goll + MIlf — fall. 


First we choose go so that ||g — gol] <1. This fixes M4, and we choose 
foso that M||f — foll <1. The result is that ||g°f — goo fall < 2, so by 
Theorems 3.3 and 2.4 we have 


deg(g of) = deg(go°fo) = deg godeg fo = deg g deg f. 


Theorems 3.3, 3.5, and 3.6 allow the computation of the degree in many 
important cases. 


If f:S"-! — $*—! does not map S*“! onto S*—!, then deg f = 0. [Hint: Itis plain 
from the initial definition that if g is constant, then deg g = 0. If ais a point 
outside the range of f, b the diametrically opposite point, and g the constant 
g(x) = 6, then ||f — gll < 2] 


If f:5*-!— $*—! is both one to one and onto, then deg f = +1. (Hint: In 
this case there exists g with go f = J, and it is plain from the initial definition 
that deg J = 1.) 


If f:S*-! —» §"—1 is of class C', then 
1 A 
Gey eS i) det df(x) dan-1. 
[Se aaa J gent 


(Hint: Do first the case where fis of classC*®. In this case deg (f; y) is similar to 
N(S*1; y), except that each point x is counted with either a +1 or a —1. 
This fact and the Jacobian formula for surface area suggest the formula 


os deg(f; y) daz-1 = ae det df(x) dan—1, 


which is just what is to be proved because deg(f; y) is constant almost every- 
where on S*-!. Now approximate a C! function by C* functions. (This is not 
a completely trivial exercise. You will have to use the ideas of Chapter 15 and 
notice that the Jacobian J, is essentially |det df|.)] 


406 16/the Brouwer degree 


4 


DEFINITION 
4.1 


Exercise 1 


DEFINITION 
4.2 


Exercise 2 


THEOREM 
4.3 


Proof 


SOME APPLICATIONS OF THE DEGREE 


In most applications we are not presented directly with a function from S*-! to 
S°-!, One common situation is to have a function from S*~! to R* — {0}. 
For such functions the degree is defined as follows: 


Tf f:S™-1 > R" — {0}, then degf = deg f, where 


ey 
ie) 


If S*-1 > S»-1 > R» — {0}, then deg (gf) = deg g deg f. 


He) = 


If f, g:S"-!—> R" — {0}, then we write f ~ g in R* — {0} and say that 
f is homotopic to gin R” — {0}, tf there ts a continuous function F:[to, ti] X 
S*-1—5 R" — {0} such that 


f(x) = F(t, x) = and g(x) = F(th, x). 


If f ~ g in R" — {0}, then deg f = deg g. 


Now let us calculate the degree of the restriction of a linear transformation 
T to S*~'. T must be nonsingular, of course, for otherwise there is a point 
x ©S""' with Tx = 0, and T is not a function from S"—! to R* — {0}. 


If T is a nonsingular linear transformation from R" to R”, then 


deg T = sign det T. 


S71 
Suppose first that T = U is orthogonal. In this case U maps S"~! into 
S*-1, and we can go back to the initial definition of the degree in Section 2. 
It is plain that Ux = Ux and that fcr a given point y there is only one 
point x with Ux = y, so the initial definition gives the required formula 
directly. 

Suppose next that T = Z is strictly positive definite. In this case 
we have in R* — {0} the homotopy 


Hix = (1 — )Hx + ix, oS 25 bk 


which gives H] g.-1~ I|s.-1 in R® — {0}; hence deg H|s.-1 = 1. On the 
other hand, sign det H = 1 because the determinant is the product of the 
eigenvalues, which are all positive. 


THEOREM 
4.4 


Proof 


COROLLARY 
4.5 


Exercise 3 


Remark 1 


THEOREM 
4.6 


Proof 


Exercise 4 


Exercise 5 


some applications of the degree 407 


In the general case we have 7 = HU and can apply Exercise 1 
and the formula det T = det H det U. 


Let f:S"-!—+ S$"! be continuous. If n is odd, then there is at least one 
point x E S*) with f(x) = +.. 


If f(x) # x for all x € S*-', then ||f + J|| < 2, and Theorem 3.3 gives 
deg f = deg —J, which is —1 by Theorem 4.3. If f(x) # —~ for all 
x € S*-! then || f — J|| < 2,sodegf = degZ = 1. Not both are possible. 


Let M be a smooth manifold of dimension m in R*. A tangent vector 
field to Mis a continuous function v:M — R® such that for each point x € M, 
v(x) liesin the tangentspace toMatx. Theorem 4.4 has the following immediate 
corollary. 


There is no nonvanishing tangent vector field to S*—' if n ts odd. 


Prove the corollary and show that there are nonvanishing tangent vector fields 
to S*—! if n is even. 


A very interesting and difficult question is whether it is possible to choose at 
each point x € S"“! vectors e1(x), . . . , én—1(x) that form a basis of the tangent 
space at x, and to do this so that the functions e;(x) are continuous. Corollary 
4.5 shows that it certainly is not possible (even to choose one single nonzero 
tangent vector) unless 2 is even. Quite recently J. F. Adams proved the very 
difficult theorem that it is possible only if n = 2, 4, or 8. 


Let F:B(0; 1) > R® be continuous, let f be its restriction to S"—, and let y 
be a point of R". If deg (f — y) ¥ 0, then there is a point x © B(0; 1) 
with F(x) = y. 


If F does not take the value y, then we have in R* — {0} the homotopy 
A(t, x) = F(t) — 9, Oss 7S 


from the constant F(0) — y to the function f — y; so the degree of f — y 
is equal to the degree of the constant, which is 0. 


Given f:S"-!—> R* — {0}, you can find 6 > 0 such that if || f — g|| < 6, then 
f~gin R* — {0}; hence deg f = deg g. 


Let f:S*-!— R*. The function d(y) = deg (f — y) is constant on each con- 
nected component of R* — f(S"-').  [Hint: The previous exercise shows that 
d(y) is continuous.} 


$08 


16/the Brouwer degree 


THEOREM 
4.7 


Proof 


Remark 2 


The use of the degree provides a very simple proof of the inverse-function 
theorem and at the same time an improvement. 


Let f:R" — R® be continuous on a neighborhood of a and differentiable at a 
with df(a) nonsingular. Then f maps each neighborhood of a on a neighbor- 
hood of f(a). 


It simplifies the notation and is clearly no restriction to assume that 
a = f(a) = 0. Let T = df(0), and for each r > 0 set 


6) =f@x) and g(x) = Pex) =1Tx, x GS". 


The idea is to show that if r is small, then f, ~ g, in R* — {0}. 
To see this choose m > 0 so that 


| Tx| > msl, 
then « with 0 < e < m, and finally ro so that 
| f(x) — Tx| < e{x| if [xl << Foe 
Now consider the homotopy between f, — y and g, — y given by 


F(t, x) = t( f(x) — y) + (1 — d (gr) — y) 
= t(f(x) — g(x) tex) -—y, OSt<1. 


If r < ro and |y| < (m — ¢)r, then F(t, x) ¥ 0, for we have 
lef) — ge(x))| <r and [gr (x)| > mr. 


Thus, ifr < ro and |y| < (m — e)r, then (f, — y) ~ (g, — y) inR* — {0}; 
consequently, deg(f, — y) = deg(g, — y). 

Next we show that deg(g, —y) #0. Since |g,-(x)| > mr, the 
ball B(O; mr) is contained in R* — g,(S"—!); so by Exercise 5 we have 
deg(g, — y) = deg gr. And deg g, # 0 by Exercise 1 and Theorem 4.3. 

Now the proof is finished, for Theorem 4.6 shows that 


f(B(O;7r)) D BIO; (m— er) ifr < ro. (1) 


Note that formula (1) [in the case where df(0) = J and hence m = 1] is just 
what was needed in proving the formulas for changing variables in multiple 
integrals. We could now go back to these formulas and replace the assumption 
that fis of class C! almost everywhere by the assumption that / is just differentia- 
ble almost everywhere. (But we still need to assume that f is absolutely con- 
tinuous, of course.) One case of particular interest is when f is Lipschitzian. 
It is plain that a Lipschitz function is absolutely continuous, and the important 
theorem of H. Rademacher says that a Lipschitz function is differentiable 
almost everywhere. This is done in Section 5. 


Exercise 6 


THEOREM 
4.8 


Proof 


THEOREM 
4.9 


Proof 


Exercise 7 


THEOREM 
4.10 


Proof 


some applications of the degree 409 


Theorem 4.7 gives half of the inverse-function theorem, but it does not assert 
that f is one to one on sufficiently small neighborhoods of a2. What about this 
assertion? 


A retraction of a metric space X on a subset A is a continuous function 


R:X— A such that R(@@) = a for eacha € A. 


There is no retraction of the ball B(a; 1) on the sphere S(a; 1). 


It is enough to deal with the unit ball and sphere. To get a contradiction 
suppose that there is a retraction R of B(0; 1) on S(0; 1). The restriction 
of R to S(0; 1) is the identity and its degree is 1. Therefore, by Theorem 
4.6 there must be a point x € B(0; 1) with R(x) = 0, which is impossible 
since all values of R lie on S(0; 1). 


(Brouwer Fixed-Point Theorem) If F:B(0; 1) > B(O; 1) is con- 
tinuous, then there is a point x © B(0; 1) with F(x) = x. 


We shall suppose that there is no fixed point and get a contradiction by 
producing a retraction of the ball on the sphere. Geometrically, R(x) is 
the point where the half-line from F(x) through x meets the sphere S"~?. 
Analytically, it is defined by 


R(x) = F(x) + tx — F@)), (2) 
where ¢ is the positive number such that 


|F@) + tx — F@))| = 1. (3) 


Find the positive solution of (3), put it back into (2), and deduce that R really 
is continuous. [Hint: Square both sides of (3) and write things out with inner 
products. You will get a quadratic equation for #.] 


(Fundamental Theorem of Algebra) Every nonconstant complex poly- 
nomial has a complex zero. 


If p(z) = Deo axz*, am 0, we can divide through by am and consider 
m—1 
P(e) = zm), beck 
k=0 


instead. The job is to show that P(z) = 0 for some z. Let 


fi(x) = P(x) and g(x) = (rx)", x€ $§}, 


410 


16/the Brouwer degree 


Exercise 8 


DEFINITION 
4.11 


Exercise 9 


Exercise 10 


Exercise 11 


and consider the homotopy 
F(t, x) = tfe(x) + (1 — tere) = tf) — grlx)) + a(x), OS <1. 


The problem is to show that if 7 is large enough then F does not 
vanish. If F(t, x) = 0, then f,(x) — g-(x) = —g;(x)/t; therefore, since 
al 


[f-(x) — gr(x)| & lgr()]. (4) 


On the one hand, |g,(x)| = 7%. On the other, if M/ is the largest of the 
|b.{ and r > 1, then |f-(x) — g-(x)| < Mmr™'; so ifr > 1 andr > Mm, 
then (4) is obviously impossible. 

What we have is that ifr > 1 andr > Mm, then f, ~ g,in R? — {0}; 
so deg f, = deg g, = deg gi, the last equality coming from Exercise 1. 


Go back to the initial definition of the degree and show that deg g; = m, where 
£1(x) = x”. 


Now what we have is that ifr > 1 andr > Mm, then deg f, = m ¥ 0, so 
Theorem 4.6 shows that there is a point z with'|z| <7 and P(z) = 0. 


In many of the theorems of this section it is not essential to have a true ball 
or sphere but suffices to have a set that is equivalent to a ball or sphere in the 
following sense: 


The metric space X is homeomorphic to the metric space Y tf there is a one to 
one function y from X onto Y such that g and g~' are both continuous. Such 
a y is called a homeomorphism from X onto Y. 


Any compact connected smooth one-dimensional manifold is homeomorphic 
(Re 


Let ||x|| = max|x;|, and let 


x 
g(x) = tal y(0) = 0. 
[x| 
Show that yg is a homeomorphism of the unit cube Q(0; 1) on the unit ball 
B(0; 1) and of the surface of Q on S"—!. 


Prove the Brouwer fixed-point theorem for any metric space X that is homeo- 
morphic to the ball B(0; 1). 


change of variable revisited ql 


Exercise 12 Prove Theorem 4.4 for the surface of the cube Q. (Note that this theorem 
would not make sense for an arbitrary metric space homeomorphic to $"~1.) 


Exercise 13. Let M bea smooth manifold of dimensions n — | and suppose that there is a 
homeomorphism ¢ of Mon S”~! that is regular at each point. Show that M 
has no nonvanishing tangent vector field if n is odd and that it does have one 
if n is even. 


Exercise 14. Use Exercise 13 to show that there is no regular homeomorphism of the torus 
on the sphere S?. (As a matter of fact, there is no homeomorphism at all. 
Unlike the one-dimensional case, there are infinitely many essentially different 
compact connected smooth two-dimensional manifolds.) 


Exercise 15 Let X and Y be metric spaces that are homeomorphic to S*', and let g:X¥—> S"™7 
and wy: Y— S*-! be homeomorphisms. If f is a continuous function from X to 
Y, set fy = Yofog !. The natural way to try to define the degree of f is to 
define it to be the degree of fyy. To what extent does this depend on ¢ and y? 


i) CHANGE OF VARIABLE REVISITED 


The Brouwer degree, via Theorem 4.7, allows the elimination of the C* hypo- 
thesis in the formula for changing variables in multiple integrals. 


THEOREM If o:R"—> R" is absolutely continuous and differentiable a.e. on the open set 
5.1 Q, and det do is locally integrable, then 
[NEMFO) & = J Feo(x) [det do(x)| de (1) 


for every nonnegative measurable function f defined a.e. on Q and every measur- 
able set EC Q. 


In the proof it can be assumed that E is bounded and that EC Q. More- 
over, if (1) holds for a sequence of disjoint sets Z, then clearly it holds for the 
union. This will make it possible to discard various sets along the way. The 
main step in the proof is to show that (1) holds when f = 1, i-e., to show that 


[NG») & = [leet do(e)| ax. (2) 


The general statement follows directly, and by the usual argument, from the 
case where f is the characteristic function of a measurable set Y. In this case, 
fe is the characteristic function of e~!(Y) and N(E, y) f(y) = N(£’,»), where 
E’ = E\q@-*(Y), so (1) reduces to (2) with E replaced by E’. There is a rub, 
however, that makes necessary a little maneuvering: 9~?(Y) is not necessarily 


412 16/the Brouwer degree 


LEMMA 
5.2 


Proof 


Proof of (2). 


measurable. More generally, foo is not necessarily measurable, though, as 
follows from the proof, fo¢|det dp| is measurable. 


If dp is singular a.e. on the set A, then |p(A)| = 0. 


Because of the absolute continuity of¢ it can be assumed that dg is singular 
everywhere on A, and it can be assumed that A is bounded (but it is not 
assumed that A is measurable). Let G DA be open with |G| < |A| + «. 
In the half of the proof of Theorem 4.3, Chapter 14, dealing with d(a) 
singular, only the differentiability at @ is used. Consequently, that 
theorem shows that each point of A is the center of a ball B with arbitrarily 
small radius such that |9(B)| < «|B|. The family of such balls, which 
are also contained in G, covers A in the sense of Vitali. By Vitali’s 
theorem there is a disjoint sequence B; so that 


ACU BUN, IN| = 0. 
k= 
Therefore, 
lp(4)| < 2 Pel + |e(N)| < €|G| < ¢(|A] + ), 


by the absolute continuity of. Since « is arbitrary, the lemma follows. 


We begin by throwing out some subsets of E. By Theorem 5.2, Chapter 
15, we have 


If |e(E)| = 0, then | N(E, y) dy = 0. (3) 


Therefore, (2) holds whenever E has measure 0. We throw out the set 
where 9 is not differentiable and the set where |det do| is not the deriva- 
tive of its integral. By Lemma 5.2 and statement (3), (2) holds on the 
set where dg is singular, and we throw out that set also. Now we will 
prove (2) under the additional assumptions that 9 is differentiable every- 
where on £, do is nonsingular everywhere on £, and |det do| is the 
derivative of its integral everywhere on FE. E of course is bounded and 
measurable with E C Q. 

Let « > 0 be given. Use the local integrability of |dp| to find an 
open GE with |G| < |£| + «and 


ie det de(x)| dx — [ Idee de(x)| dx <e. (4) 


Let a € E, and let ¢ = de(a)-!9, so that df(a) = I. Ifr is sufficiently 
small, then by Theorem 4.7 and the definition of the differential, 


B(b, (1 — €)r) C #(B(a, r)) C BCS, (1 + )r), boa), 


LEMMA 
5.3 


Proof 


change of variable revisited 413 


Therefore, 


. MBO, 7) a 
eC) eae 


and since |p(A)| = |det dp(a)| |%(A)|, 
|det de(a)|(1 — «)"|B(a, r)| < |e(B( 7))| 
< |det do(a)|(1 + 6) "|B(a, 7). (5) 


Since |det dp(a)| is the derivative of the integral of |det dp(x)|, it is also 
true that for small r 


|det de(a)| |B(a, r)| — | nan Meteo) a | < ¢|B(a, r)|. 


Combining the last two inequalities we find that 


lp(B(a, r))| lies between 


qage] _ [det do(x)| de + el Ba, 7) (6) 


for all sufficiently small r. 

This shows that for any positive integer 4, the balls of diameter less 
than 1/k with center in £, contained in G, and satisfying (6), cover E in 
the sense of Vitali. For each k, let Fy, be a disjoint sequence of such 
balls covering almost all of £. According to formula (3), Section 5, 
Chapter 15, we have 


[N.E2) b = ¥ lol). 


According to formula (2) of that same section, the integral of N(E, y) 
lies between 


(1 —.«)* [ ldet do(x)| dx — ¢|G| 


and 


(1 +6)" f |det de(x)| de + |G, 


which proves formula (2), because of the conditions on G. 


If |(A)| = 0, then N(A, y) = 0 a.e. and do is singular a.e. on A. 


If A is measurable, the first conclusion follows from formula (3), and the 
second from the first and formula (2). If A is not measurable, let B’ be 
a Gs of measure 0 containing 9(A), and let A’ = o—1(B’). Then A’ is 
measurable and 9(A’) = 8B’, so the conclusions hold for A’, and therefore 
for the smaller set A. 


414 16/the Brouwer degree 


End of Proof of the 
Theorem 


THEOREM 
5.4 


Remark 


As stated at the beginning, it is sufficient to prove formula (2) with E 
replaced by E’ = Ef\o-1(Y), where Y is measurable. If A is the set 
where either ¢ is not differentiable or dp is singular, then 


N(E’, 9) = N(E'™) A, 9) + N(E' — A, 9). (7) 
By the absolute continuity of p and Lemma 5.2 |9(A)| = 0, and then by 
Eemma 5.3 N(E GVA, y) 9 ae. 


Now, £’ — A is measurable. Let Y’ be a Gs containing Y with 
[Ysa et 


AO = GO) and A’ = Ef\q@-(Y’—Y)—A 


then E’ — A = (EM X') — A— A’. Now 9(A’) C Y’ — Y,s0 |9(A’) |= 
0. By Lemma 5.3, do is singular a.e. on A’, but A’ is contained in the 
set where do is nonsingular, so |A’| = 0; and obviously (EF (\.X’) — A 
is measurable. 

Since £’ — A is measurable, formula (2) holds with EF replaced by 
E’ — A, and we have just seen that neither side changes when the latter 
is replaced by EL’. 


(H. Rademacher) If o:R®—> R” ts locally Lipschitzian on the open set Q, then 
[NEMO) & = [_ foele)idet do(2)| dx 


Sor every nonnegative measurable function f defined a.e. on Q and every measur- 
able set EC Q. 


This is immediate from Rademacher’s differentiability theorem, as the absolute 
continuity of and local integrability of det dp are evident. 


When g is one to one the proof of Theorem 5.1 is much simpler—almost identical 
to that of the earlier theorem on change of variable, Theorem 4.5, Chapter 14. 
In this case, v(Z) = |(£)| is an absolutely continuous regular Borel measure, 
so all that is needed is the fact that Dv = |det dp| a.e. which can be obtained 
from Lemma 5.2 and formula (5). The local integrability of det dp comes 
automatically from the local integrability of Dv. In consequence, it also comes 
automatically when 9 is locally one to one. (9(*) = x sin(1/x) is an example 
where det do is not locally integrable.) 

Oné reason for not assuming that 9 is one to one appears in the following 
results on Lipschitz changes of variable. In his famous articles on the extension 
of differentiable functions, H. Whitney gave in passing a simple formula for 
extending Lipschitz functions. 


Exercise 1 


THEOREM 
5.4 


Remark 


Exercise 2 


change of variable revisited 415 
Let f be a real-valued Lipschitz function defined on a subset A of a metric 
space X and with Lipschitz constant M. Then 
F(x) = sup{f(a) — Md(x, a):a © A} 
is a Lipschitz extension of f to X with the same Lipschitz constant. Con- 


sequently, a Lipschitz function from any subset of R” to R™ has a Lipschitz 
extension to R”. 


Ifo: Rx — R* 1s Lipschitz on a measurable set E, then for any nonnegative 
measurable f defined a.e. on E 


[NENSO) & = J feels) |det dels)! de, 
where do is the differential of any Lipschitz extension of 9. 


Even if ¢ is one to one on £, there may be no one-to-one Lipschitz extension, 
so this theorem requires Theorem 5.1 without a one-to-one assumption. On 
the other hand, the theorem shows that det dp is independent a.e. of the exten- 
sion of 9. 


Discuss whether or not the last statement is surprising. 


416 


17 Extension of 


1 


THEOREM 
1.1 


Exercise 1 


Exercise 2 


Exercise 3 


Differentiable Functions 


INTRODUCTION 


Extension problems take the following general form. Let X and Y be metric 
spaces and let F(X, Y) be a class of functions from X to Y. Let A be a 
subset of X, and let F(A, Y) be a class of functions from A to Y. The extension 
problem is to determine whether each function in F(A, Y) has an extension in 
F(X, ¥.). 

The first useful extension theorem was probably the classical theorem of 
Tietze. 


Let A be a closed subset of the metric space X and let Y be an interval of real 
numbers. Every continuous function from A to Y has a continuous extension 
Srom X to Y. 


If Y is the closed interval 0 < y < 4, one extension is the sum of the infinite 
series in the last part of the proof of Stone-Weierstrass. 


If Y is the half open interval 0 < y < 4, let F be the extension given by 
Exercise 1, let B be the set where F = 8, and let e be continuous with values 1 
on A, 0 on B, and between 0 and | elsewhere. Then eF is the required exten- 
sion. 


If Y is the open interval —b < y < 3, the positive and negative parts of the 
function can be extended separately by Exercise 2. 


The theorem results from the fact that every interval is homeomorphic 
to one of the above. 


THEOREM 
1.2 


Exercise 4 


THEOREM 
1.3 


THEOREM 
1.4 


Proof 


introduction 417 


In one of his pioneering papers on extension, H. Whitney gave in passing 
a simple formula for extending real-valued Lipschitz functions. 


Let f be a real-valued Lipschitz function with Lipschitz constant M. Then the 
Junction 
F(x) = sup{f(a) — Md(x, a):a GA} 
is an extension of f with the same Lipschitz constant. If f has values in [a, 6}, 
then G(x) = min(max(F, a), 6) is an extension with the same Lipschitz 
constant and with values in [a, 6]. 


Show that Whitney’s formula works. 


Throughout the chapter, as in the above examples, the functions will be 
real-valued, but there are important extension theorems for functions with 
values in other spaces Y. Here are a couple of them. The first, a result of 
M. Kirszbraun, is given without proof. The second is a special case of a classical 
result in dimension theory. 


Every Lipschitz function from a subset of R™ to R” has a Lipschitz extension 
with the same Lipschitz constant. 


The Euclidean spaces are quite special in this respect. Usually, Lipschitz 
functions do not extend with the same constant. (Whitney’s formula gives a 
Lipschitz extension, but not one with the same constant unless n = 1.) 


If m <n, then every continuous function from a closed subset of the sphere 
S™ to the sphere S" has a continuous extension with values in S". Ifm > n, 
this 15 not so. 


If f is defined on A, first use Tietze to extend each coordinate function 
in order to obtain an extension F, with values in R™+?. If F, does not 
take the value 0, then F = F,/|F,| is the required extension. If F, does 
take the value 0, fix « (how small to be determined presently), and use 
Stone—Weierstrass to find F, of class C! with ||F, — F,|| < «, the norm 
being the maximum. By Corollary 2.9 of Chapter 13 the range of F, 
has measure 0, so there exist points y not in the range satisfying |_| < «. 
Choose such a y and set F,(x) = F,(x) —», and F, = F;/|F3|. It is 
easy to see that for any 9 © S" and any z £0, |@ — z/|z|| < 2|@ — 2, so 
that ||IF, —fll4 < 4e. Thus, if g = F,—/f on the set A, then the 
maximum of |g| on A is 4c, and we can use Tietze again to get an exten- 
sion G with ||G|| < 4eyn + 1. Now F, = F, — G is an extension of /, 
and it cannot take the value 0 if 4e jn + I < 1, since F, has absolute 
value 1 everywhere. Thus F = F,/|F;| is the required extension. 


gi8 


17 extension of differentiable functions 


Exercise 5 


DEFINITION 
1.5 


Exercise 6 


PROPERTY W 


Remark 


Ifm = n+ 1, take A = $* to be the equator in S"+!, and f to be 
the identity function. An extension of f would give a retraction of the 
upper hemisphere in $”+!, which is the same as B"+1, onto S$", which is 
impossible by Theorem 4.8 of Chapter 16. 


The sphere S$” in Theorem 1.4 can be replaced by R™. (Note that although R™ 
is homeomorphic to a subset of S”, the image of a closed unbounded set cannot 
be closed, so this result is not a completely trivial consequence of Theorem 1.4.) 


There are many different classes of differentiable functions that play 
important roles in the various branches of analysis. The most immediate of 
these are the classes C™ themselves. These have been defined only for functions 
on open sets, while for the purposes of extension it is necessary to expand the 
definition. There will be a basic open set 2, and A will be the union of Q and 
a part of the boundary. 


Suppose that AC A°. Then C™(A) consists of the functions in C™(A°) with 
the following property: each point of A has a neighborhood G such that all 
derivatives of order < m are uniformly continuous on G (\ A®. 


If u@ C™(A), then each derivative of order <m has a unique continuous 
extension to A. 


The symbol D,u is used for the extension to A as well as for the original 
derivative. 


The extension problem for the classes C” was solved in a very satisfying 
way in the work of Whitney mentioned above. Whitney gave a general 
definition of ‘‘class C” on an arbitrary closed set in R”’, showed that every 
function of class C™” on a closed set has an extension of class C”™ on R”, and 
showed that if Q is open and has the Property W below, then C™(Q) in his 
sense is the same as C™({) in the sense of Definition 1.5. 


For each point a of Q there are positive numbers r and L such that any two 
points x and y of Q(\ Bia, r) can be joined by an are in Q of length 
<L\x — 9]. 


These beautiful results of Whitney were largely ignored until Whitney himself 
show2d how they could be used to answer other difficult questions of interest, 
at which point they became appreciated. Proofs and applications of Whitney’s 
theorems are now readily accessible, and will not be given here. 


Some other classes of differentiable functions that play important roles in 
analysis are as follows. 


Exercise 7 


DEFINITION 
1.6 


THEOREM 
1.7 


introduction 419 


The space BC™(A) consists of the functions u € C(A) such that 
llulm = sup {|Dju(x)|:« © A, |i] < m} < ow. 
For 0 < s < 1, set 
HO) — 1) vc, 


ll“lle = sup —~~—— es 


eh 
The space BC™8(A) consists of the functions in BC™(A) such that 


Se 


Il4llme = Max{|lellm, |Diuls:|2] = m} < o. 


These are called Hélder or Lipschitz spaces. 
The Sobolev space L2(A) is the completion of the functions in C™(A) such 
that 


1 
[eleg = | > [Dane as ta, NED ee 


lism 


When the role of the set A must be stressed it is included as a subscript, 


€.8.; llullm, 4+ 


BC™(A) and BC™8(A) are complete. 


All of the above have useful variants, but for present purposes these will 
suffice. The objective of the chapter is to construct an extension operator £, 
depending only on A, which extends locally integrable functions on A to locally 
integrable functions on R” in such a way that if u belongs to some space of the 
above type, then Zu belongs to the corresponding space on R®. This kind of 
result (even for the Sobolev spaces alone) requires a stronger condition on A 
than the Whitney property, but a condition that is still relatively mild. The 
basic condition is that the boundary is locally the graph of a Lipschitz function 
in some set of coordinates. 


An open set 2 is a Lipschitz graph domain if for each point a © OQ. there 
exist a neighborhood G of a, a choice of the coordinates, and a Lipschitz 
Junction f on R"-} such that 


ONG “= ex ry iG 
The open set Q is a Lipschitz graph domain if and only if for each point a C aQ 


there exist r > O and an open convex cone V such that if T, = T (\ BO, r), 
then 


(Goud th — Gh — © Gg 6S 00, a — 5) = 


Exercise 8 Prove the theorem. 


The differentiable extensions will be accomplished by first localizing and 
then performing a generalized reflection across the graph of a Lipschitz func- 


420 


17/extension of differentiable functions 


LEMMA 
1.8 


Exercise 9 


LEMMA 
1.9 


Proof 


Exercise 10 


Remark 


tion, and it is the second step that will involve difficulty. The set-up for this 
second step is as follows. A’ is an open set in R®~}, f is a Lipschitz function, 
which by Whitney’s formula is defined throughout R®-}, 


A, = {xi GA, ey > f(x')}, AL = {xx CA an, < ST (x’)} 


and A = A, A_. The function uw is defined on A_, a generalized reflection 
Ru is defined on A,, and the extension Eu is defined essentially to be u on A_ 
and Ruon A,. This requires a differentiable matching of u and Ru. The fol- 
lowing lemma shows that differentiable matching is enough for differentiability. 


Let f be Lipschitz and let u and Ru be of class C™ on A_ and A,. If 
D;Ru = Djyuon A_(\ A, for \i| < m, then Eu is of class C™ on A. 
Prove the lemma. Hint: It is enough to treat the case m = 1. If @ is a 


direction such that lines with direction @ meet the graph in only one point, 
then DgEu exists on the graph and has the common value of Dgu and Dgku. 
Therefore DgEu is continuous. Since this is true for all 9 in a certain open cone 
(note Theorem 1.7), u is C}. 


The next lemma has a certain technical use. 


Let u© C™(A_) vanish for xn < f(x’) — d, and let B'C A’ be compact. 
There is a sequence u,€ C°(R) such that ||u — upllm,p_-—> 0. 


For h > 0, let r,u(x) = u(x’, x, — hk). Since the derivatives of u of 
order < m are uniformly continuous on B_, it follows that 


|| — TpUllln,p_—> 0 as h0O. 


For each fixed h, the regularization procedure of Theorems 11.5 and 
12.1 of Chapter 13 provides u,€&C%(R*) with ||7,u — uallm,p_ < A. 


Lemma 1.9 only requires continuity of f- 
There are two standard meanings for the symbol Djwu. In this book z = (1,,..., 7m) 


with 1 <7; <n, |7] = m, and 
Diu = DD) 


tan 


REFLECTION ACROSS HYPERPLANES 


The first differentiable reflection across hyperplanes was given by L. 
Lichtenstein for functions of class C!. Suppose u is of class C! on 
A_ = {x:x' GA’, x, < 0}, let by and 6, be distinct positive numbers, and set 


Ru(x) = agu(x’, —boxn) + ayu(x’, —b xn) on A,. 


Exercise 1 


reflection across hyperplanes 421 


Then Ru is of class C1! on A,, and Ru matches u, along with the first derivatives, 
if and only if ay and a, satisfy the equations 
A + a = 1, —aAyby — a,b, = 1. 


If a) and a, do satisfy these equations, then by Lemma 1.8 the function Eu 
which is u on A_ and Ru on A, is of class C} on A. 

The Lichtenstein reflection for class C1 was extended to C™, m < o, by 
M. Hestenes. If is of class C” on A_ and bp, ..., 6, are distinct positive numbers, 
set 


u(x) s a,u(x’, —b, xn) on A,. 
pao 
Then Ru is of class C™ on A,, and 
D,Ru(x) = ¥ a,Dyu(x’, —b,%,)(—b,)”, 
w=0 


where » is the order of differentiation in x,. Thus Ru matches u, along with all 


derivatives of order < m, if and only if a, ..., am, satisfy the equations 
m 
Sao? = ton y= —) On: (1) 
po 


The matrix for this system of equations is the matrix V(—b), where V(z) 
= V(%, ..., Zn) has z/ in row », column yp, starting with row 0 column 0. Such 
a matrix is called a Vandermonde matrix. 


If there is a linear relation between the rows of V(z) with coefficients ¢9, . . ., Cm, 
then the polynomial 
m 
b(*) = Dex 
v=0 
vanishes at the points Zp, ... , Zm, So all c, are 0 if these points are distinct. 


By Exercise 1 the equations (1) have a unique solution. If this solution is 


dy, +--+» 4m, then by Lemma 1.8 the function Eu which is u on A_ and Ru on 
A, is of class C™ on A. 
R. Seeley had the very nice idea of choosing the numbers bo, ..., 5, in 


a special way so that, as m goes to oo, the solution to the system (1) converges 
to a solution to the infinite system 


> a(—4,)” = 1, pO eee (2) 
=0 
To carry out Seeley’s idea, an explicit formula for the solution to (1) is needed, 
and this can be obtained through Cramer’s Rule (Chapter 9, Section 11, 
Exercise 9). 


G22 


17/extension of differentiable functions 


LEMMA 
2.1 


Proof 


The determinant of the Vandermonde matrix V(z) is calculated by 
successively multiplying row v — 1 by 2 and subtracting from row », starting 
from the bottom. In column 4 this produces the entries 2.9 2a — 29) eas 
column 0 has a nonzero entry only in row 0, and in the minor of that entry 
column y has the factor z, — z. The removal of these factors leaves a Vander- 
monde matrix V(z,,..., Zm). Therefore 

det V(z, ..-,Z») = [J (z — 29) det Viz, ..., Zn). 
uH>O0 
It follows by induction that 
det V(z) = TJ z, —z,. (3) 
vy> pe 

When one of the columns in the Vandermonde matrix is replaced by a 
column of 1’s, the matrix remains Vandermonde. From equation (3) and 
Cramer’s rule it follows that the solution to the system (1) is 

b +1 


4, = An(u) = b,—b° 
vA Ls e 


Seeley took 
b 


In this case the solution to the system (1) becomes 


= 17+ 1 
Oy, = Am(w) = "| lea 1 I] Top: 
v= v= 
Since b > 2 it follows that bY — 1 > 6-1, and therefore 
= 1 
——— —pep—-1j2 
in| eas | < bo ee, 


Since —log(] — x) <x for 0 <x < 1 it follows that 
m = be 


»= oH] withb>Q. 


a= NO) es. 
v=1 


It is plain that for fixed y the sequence (—1)#a,,(2) is increasing, so we have 
Df am{j) ts the unique solution to the system (1) then 
0 <(—l)"an(u) 7 (—I)a, < eb-Ho-ne 
and the a,, satisfy the equations (2). 


It remains to be proved that the a, = a() satisfy the equations (2). 
Think of a and the a, as functions on the integers and the sums as 
integrals with respect to counting measure. The estimate in the Lemma 
allows use of the dominated convergence theorem. 


DEFINITION 
Dye 


LEMMA 
2.3 


Exercise 2 


DEFINITION 
2.4 


THEOREM 
2.5 


Proof 


reflection across hyperplanes 423 


If u is a function on A_, then 
Ru(x) = D> au’, —b,x,) on A, 
0 


p= 
wherever the series converges. 


If u is continuous on A_ and for each compact K’ C A’ there are constants 
C, d, and v such that 


|u(x)| << Clx,|"  forx’ EK’ and xy < —d, 


then the series for Ru converges uniformly on compact sunsets of A,, and 
Ru is continuous on A,, and matches u to give a continuous function on A. 


Prove the lemma. 


In order to obtain a single extension operator that can be applied to all 
functions, we first cut off the function below x, = —2. (If the operator is to 
be applied only to functions that satisfy the condition in the lemma, along 
with all relevant derivatives, then the cut off is not needed.) For the cut off 
we fix once and for all a function 4 © C?(R") which is 1 for x, > —1 and is 
0 for x, < —2. 


If u is a function on A_, then Eu is the function which is u on A_ and R(pu) 
on A,. 


Tf u is of class C™ on A_, then the extension Eu is of class C™ on A. For 
each of the common norms in Section | there ts a constant C such that ||Eul| < 
C|lull. 


On any compact subset of A? the series for R(pu) is finite, so R(Yu) is of 
class C” on A® and the derivatives are given by 


D;R(du) = ¥ a,Debulx’, —b,%»)(—0,)” for lil <m, 
p=0 


where v is the order of differentiation in x,. Since the derivatives of pu 
are bounded on compact subsets of A_, the estimate in Lemma 2.1 gives 
uniform convergence of the above series on compact subsets of A,, and 
this proves that R(u) is of class C" on A,. The required matching then 
follows from the equations (2), and Lemma 1.8 shows that Eu is of class 
C™ on A. 


As far as the norms go, the estimate in Lemma 2.1 gives, for example, 


[Elle < Cll 
with 
C= 6 ¥ b-Hu-D2(putl — ])m, 
3 pear — 1 


424 


17/extension of differentiable functions 


Exercise 3. Show that the above gives the required norm estimate for the norm on BC”. 


Exercise 4 Use the above to obtain the norm estimates for the Hélder and Sobolev cases. 


THEOREM 
3.1 


(These estimates are proved for general Lipschitz graph reflections in Sections 
5 and 6, but they are very simple here.) 


REGULARIZED DISTANCE 


For many years I had the good luck to work closely with N. Aronszajn, 
a wonderful mathematician who left a permanent mark in many areas of 
modern analysis. The results in the rest of the chapter were joint work with 
Aronszajn and R. Adams, but they are due mainly to Aronszajn. 

At issue is to reflect, not across a hyperplane, but across the graph of a 
function x, = f(x’) defined on R"-1. It is reasonable to expect that x, — f(x’), 
the vertical distance to the graph, should take over the previous role of oN 
the vertical distance to the hyperplane. In this case, the analog of Seeley’s 
reflection formula is 


Ru(x) = eaele Se) — b,(%, —F(2'))). 


It can be shown that this works when f has enough differentiability. However, 
that differentiability also allows transformation of the graph to a hyperplane, 
so nothing new is obtained. Aronszajn’s idea was to rewrite the formula so 
that f appears only via the vertical distance: 


Ra s a,U(x", x, — DEM (x_ — f(x'))), 


to replace the vertical distance by a “regularized vertical distance” p, and to 
use the reflection formula 
Rus) = > ayu(x', xy — b#*p(x)). 
w=0 
This idea works whenever fis a Lipschitz function. The theorem on regularized 
distance is as follows. 


Let f be Lipschitz on R"-1 with Lipschitz constant M, and let 0 < ¢« <1. 
There are constants Ci, and a function p of class C® on xn > f(x’) such that 


(a) (1 — €)(%n —f(*')) S p(x) S an — f(%'), 
(6) [Dip(x)| S Calen —f(%'))™ for |a| = m, 
(c) Dap(x) > 1l—e. 


regularized distance 425 


Remark Properties (a) and (b) are sufficient for extension in the classes C”, BC™, etc. 
but (c) is needed for integral estimates such as those in the Sobolev norms. 


Proof Fix a function e, of class C® on (0, 0) which is decreasing, constant on 
a neighborhood of 0, 0 for ¢ > 1, and satisfying 


2 1 
n—2 = 
I, NO ere (1) 
Then e(x’) = e,(|x’|) is of class C° on R®-!, is a decreasing function of 
|x’|, vanishes for |x’| > 1, and has integral 1. It will be shown that 
numbers c and k can be chosen so that 


pts) = kf. SOO)" (ee | ay (2) 


has the required properties. For the moment ¢ and k remain undeter- 
mined, but subject toc < 1. 
Set r = x, —/f(x’) and r, = x, — f(y’). In the range where e is 
not 0, M|x’ — y’| < er, so that 
Sey) =f )| S ye Me | Sa ery. 
Hence r < (1 + ¢)r,, and similarly r > (1 — c)r,. Therefore 


rT 
OS ae <4; (3) 


In particular, r, is not 0 on the range of integration, so formula (2) makcs 
sense. If we set 


7 ee) (4) 
ee FF) Om 
formula (2) becomes 
pla) =  frinre(e’) oy’. (5) 
Continuation of the proof requires formulas for the derivatives of z’ and 
the Jacobian of the transformation (4) (as a change of variable from y’ to 2’). 


Exercise | The derivatives of z’ are given by 


ea ae a 6 

ot = by M544, (6) 
az;  M of 

ae 7 

Oy or - one + ag me 7) 


Exercise 2. If A is the n X n matrix {—8;, + a;b,}, then det A = (—1)"(1 — (a,b)). Con- 
sequently, the Jacobian of the transformation (4) is given by 
J = (Mfer)*4{1 — (e/M)@', VA] = (Mfen)" J (8) 
Hint: A is the matrix of the linear transformation Tx = —x + (x, a)b. Use 
a new orthonormal basis in which e, = a/|a|. 


420 


17/extension of differentiable functions 


Proof of (a) 


Proof of (6) 


Proof of (¢) 


The transformation (4) is Lipschitz with the Jacobian given by (8). 
Since |Vf| < M, and since |z’| <1 on the range of integration in (5), 
it follows that 

oe Sa (9) 


The Rademacher Theorem 5.4 of Chapter 16 justifies using (4) as a 
change of variable in (5). The result is 


AO. = Ha) ie dz’. (10) 


Using the right half of (3) and the left half of (9) we get 
p(x) < A(o/M) "(1 — c)-%(x), 
Consequently, the right half of (a) holds if 
= (M/c)"-\1 — ¢)* (11) 
The left half of (a) will follow from (c). 


For |7| = m, Dip is a sum of terms of the form 
In =k [rE™-"(MYe) 21-ltDye(e’)2"* dy’, |g < ipl <m. 


To verify this, assume it for a given value of m and apply an additional 
derivative in accordance with (6). The result is a sum of terms of the 
same form with m increased by 1. Making the change of variable (4) 
and inserting the value of & from (11), we get 


= (1 —c)? | mM) D gel) dz’, 


Using the left half of (3) and the left half of (9), we get 
Em| < |Br*|(1 — ¢)(1 + ¢)™-*(M/c)!7I-Iall[el] 7 (x)4-™, (12) 


A simple calculation gives 
Dyp(x) = —k [r-m(le'lej(le'l) + (@ — Qeq(|e'))) & 


The change of variable (4) gives 
le'lea(le’) + (n — 2)e(le'l) 4 


Ji 


Since e, > 0, and e, < 0, it follows from (9) that 


a 


Drp(*) aes 


Dyo(s) = — FP fierce) de 


—(1 = e)(n — 2) fex(le'l) de 


DEFINITION 
3.2 


Exercise 3 


THEOREM 
3.3 


Proof 


regularized distance 427 


According to (1) and integration by parts, the first integral is 1 — n, 
and the second one is 1. Inserting these values we get 

Dap(x) > (1 — e(2n — 3))(1 —d/(1 +.) > 1 — (Qn — De. 
Therefore, (c) holds, hence also the left half of (a), if ¢ = «/(2n — 1). 


Fix once and for all a function e, GC” on (0, 00) which is decreasing, 
constant on a neighborhood of 0, 0 for t > 1, and satisfying 


[relent de = Vise, 


and set e(x ) = e,(|* |). For 0 < «=< I, tic = </(2n — 1). Uf 7 2s 
Lipschitz with Lipschitz constant M + 0, the regularized distance to the 
graph of f is the function 

Me = wy 


pts) = (ajar — 0} [ire syn EES 


If f’(x’) = f(x’/M), then f’ has Lipschitz constant 1. If p’ is the correspond- 
ing regularized distance, then p(x) = p’(Mx’, x,). 


The final form of Theorem 3.1 is as follows. 


There are constants Cy, depending only on m and on the dimension n such that 
if f is Lipschitz with Lipschitz constant M + O, then the regularized distance 
to the graph of f is of class C@ on x», > f(x’) and has the following properties: 


(a) (1 — €)(%n —f(%’)) S p(*) S *n —S(). 
(b) |Dip(*)| < Cn Mme (xn —f(x'))-™ 

where m = |t1| and m’ 1s the order in x’. 
(c) Dyp(x) > 1 —e. 
All that needs discussion is the new version of (6) in which the form of 
the constants is made explicit. For M = 1, this is clear from (12), 
and then for any M + 0 it follows from Exercise 3. 


Before going to the next section on reflections we record a couple of lemmas 
for future use, but first, for given points x and_y above the graph of fwe construct 
a path joining them with length comparable to |x — y|, and no closer to the 
graph than the points themselves. For any direction 6’ in R"“', set 

xs) =~ F505 x, 4 5M), (13) 
It is easily checked that 
Xn(s) —f(x'(s)) > xn —f(*’) tons 22 (0), (14) 


428 17 [extension of differentiable functions 


LEMMA 
3.4 


Proof 


LEMMA 
3.5 


Proof 


a 


If y’ 4 x’, let 6’ = (y' —x’')/|y’ — x’|, let 59 = |p’ —x’|, and let z = RSG) 
so that 

= (0, %n + Mix’ — 7’). 
If y’ = x’, set z = x. According to (14), the path consisting of the segments 
[x, z] and [z, y] lies above the graph, and it is clear that 


lx —2z| < JP TT | 5’ — x’), (15) 
and that 
Iz =| S19n — ¥&n — Mi y’ == | |. (16) 
Consequently, 
Ix 2] + le —9| < 3JMP + |x — 9. (17) 


The regularized distance p = p, is Lipschitz, satisfying 
lPe(y) oa Pe(*)| < Ce Me lx = Jil M, = max(M, 1), 
where C depends only on n. 


By Theorem 3.3(6), any directional derivative of p, is bounded by C, Moe", 
so, by (17), the integral over the path above is bounded as in the lemma. 


Tf 9(x) = Dy Pe cae Da Pes C= dial; then 


le) < CMge~a(x, — f (x"))r-2, (18) 
If tn —f(#) < In —f(0!), then 
ley) — o(%)| < CME e01(x, — f(x"))-@-H]y — al, (19) 


C depends only on q, r, and n. 


Formula (18) follows directly from Theorem 3.3(b), which gives also 
that any directional derivative of g is bounded by the same expression 
with g replaced by g + 1. Formula (19) then follows from integration of 
the directional derivative over the two segments described above and 
the inequalities (14) and (17). 


REFLECTION ACROSS LIPSCHITZ GRAPHS 


Throughout the section f is a Lipschitz function with Lipschitz constant M, 
M, = max(M, 1), and p is a regularized distance to the graph with some 
fixed 6 > 2 and e <}. The symbol C,, is used to designate a constant 
depending only on m, the dimension n, and quantities that are assumed to be 


DEFINITION 


4.1 


Exercise | 


LEMMA 
4.2 


Exercise 2 


Exercise 3 


Exercise 4 


DEFINITION 
4.3 


THEOREM 
4.4 


reflection across Lipschitz graphs 429 


fixed once and for all such as 5, ¢, the function e, used in constructing p, etc. 
A’ is an open set in R7-1, 

An = (xix SAG, = fei}, A_ = {x:x' SA’, x, < fe )} 
and Amal AS. 


If u ts a function on A_, then 


ice) 
Ru(x) = > a,U(x', Xn — ¢,p(x)) on A, C, = bet, (1) 
pao 
wherever the series converges. 


If x A,, then (x’, x, — ¢,p(x)) © A_ and its vertical distance to the graph 
lies between b+#(x, —/f(x’)) and (b#+? — 1)(x, —f(*’)). 


If u is continuous on A_ and for each compact K’C A’ there are constants 
C, d, and v such that 


la@P SC’ fora © K° Vand “2, = —a, 


then the series for Ru converges uniformly on compact subsets of A,, and 
Ru is continuous on A, and matches u to give a continuous function on A. 


Prove the lemma. 


In order to obtain a single extension operator that can be applied to all 
functions, we cut off below f(x’) — 2. Fix once and for all a function ¢ © C*(R") 
which is | for x, > f(x’) — 1 and 0 for x, < f(x’) — 2 and is bounded along 
with all derivatives. 


Produce such a %. Hint: As in Lemma 12.2 of Chapter 13, regularize the 
characteristic function of the set where x, > f(x’) — 3. Note that |{s||,, depends 
only on the regularizing function, which can be fixed ahead of time and once 
and for all. 


There are constants C,, such that 


I[yutl]m S Crm||24llm- 


If u is a function on A_, then Eu is the function which is u on A_ and R(yu) 
on A... 


If uSC™A_), then Eu C™(A). There are constants Cm such that if 
u © BC™(A_), then Eu © BC™(A) and 


|Etllm < CmMG|lellm- 


430 


17/extension of differentiable functions 


By Exercise 4 the function % can be dropped, and it can be assumed that 
u vanishes for x, < f(x’) — 2. In this case, by Exercise 1, the series for Ru 
is finite on any compact subset of A?. Therefore, u ©C™(A®) and the main 
problem is to show C™ on A.,, itself. In the present situation the derivatives 
are less simple, and some preparatory lemmas are needed. For purposes of 
induction we define 


Ryuls) = ¥ a,uls's 4 — p(a))(—6,)” on Ay (2) 
= 
wherever the series converges. 
It is immediately verified that 


ORyu R ou 


op Ou 
eee ee 
ox; ” @X4 as Ox; NO 


(3) 


In the sequel we will encounter integers v, r, and q, and systems of indices 
?, p, and q;. They will always satisfy the relations 


gq= dial lelt+a=l+r, (4) 
and 
i= aes lar sg = i (5) 
LEMMA D,Ryu = R)Dyu plus terms of the form 
4.5 
DT Ore) pli, Dt, 

Proof The lemma is obvious for |2| = 0, so we assume it is true for |:| = m 
and apply an additional derivative in accordance with formula (3). For 
example, when we differentiate one of the factors Dg,p, r and |p| remain 
the same, while |?| and g both jump 1. When we differentiate Ry,,D,u, 
there are two terms. In one, |z| and |p| both jump 1, while the other 
indices remain the same. In the other, r, |2|, |p|, and g all jump 1. Con- 
sequently the relations (4) and (5) are preserved. 

LEMMA D;Ru = RD4u plus terms of the form 
4.6 
Dp ple Ou. (6) 
This is just the previous lemma with A = 0. 
Exercise 5 Sincec, = b4+1 = 6, + 1, the equations (2) in Section 2 become 
ee) oO 
Ya.=1 Dd ttn. = 0 for 2s 
Consequently, 
oO 
D (un — “o)*ez = 0 fork >0, v>1. (7) 


reflection across Lipschitz graphs 431 


a For each vy > \ and k > O there is a constant C such that 
4. 
|R,u(x)| < Cllelle@)*. (8) 
Proof By Taylor’s formula, centered at x, — cop(x), 


Ula, em — e402) = SY Dials sm — ce) (C0 — eu) G! 


+ re(*, B)(Co — Cu) *e(x)* 
with 
Irac(y 4) | <S llellae/*t. 
When any individual term from the polynomial part is inserted in the 
series for R,u, the sum is 0 because of the equations (7). When the 
remainder is inserted, the result is the inequality (8) with 


] ee) 
Cx Fi 2 leu bethut), 


This lemma provides an estimate for the unwanted terms in Lemma 4.6. 


LEMMA There are constants Cy, such that ff 1<v< |i] <m and |1| <k 
4.8 <m + 1, then 
[Dp (2) « - - Dgp(x)RyDyu(x)| < CM B|lullyo(x)*H. (9) 
Proof According to Lemma 3.5, 


[Dg,p(*) - - - Dae (*)| S CMG p(x)". 
According to Lemma 4.7, 
|R,Dpu(x)| = C||D pullx—ipip (*)*4?!- 


Putting the two together and using the relation (4) between the indices 


we get (9). 
LEMMA There are constants Cm, such that if u& BC™(A_), then 
4.9 
[Rell << Cm M Gell ms 
hence 
Et < CmMGlle||ms 
provided Eu © C™(A). 
Proof Consider the decomposition in Lemma 4.6. First, 
|RDixllo < Ditllo 2 lau: 
Then the remainder terms are covered by Lemma 4.8 with k = |i]. 


To complete the proof of Theorem 4.4 it remains to be shown that if 
u€C™(A_), then Eu © C(A). This is done in two steps. 


432 


17/extension of differentiable functions 


LEMMA 
4.10 


Proof 


End of Proof of 
Theorem 4.4 


5 


THEOREM 
5.1 


Exercise | 


LEMMA 
5.2 


Exercise 2 


[fuSC™(A_), then Eu GCA). 


By Lemma 4.2, RD,ju matches D,u to give a continuous function on A, 
so what has to be shown is that the remainder terms in Lemma 4.6 do 
not spoil the matching. If ||u||m44 is finite, this is shown by Lemma 4.8 
with & = || + 1. If B’ is any open subset of A’ whose closure is compact 
and contained in A’, then on B’ this norm is finite and we conclude that 
Eu is of class C™ on B. Since this is true of every B’, Ew is of class C™ on A. 


Let u © C™(A_), and let B’ be open in R*-! with B’ C A’ and B’ compact. 
By Lemma 1.9 there is a sequence u,€@C*(R") with || — wglln,p_—> 0. 
By Lemmas 4.9 and 4.10 the sequence Eu, is Cauchy in BC™(B), so by 
Exercise 7 of Section 1 Eu; has a limit v in BC™(B), and it is plain that 
Eu =v. Therefore Eu © C™(B). Since this is true of every B’, Eu EC EGA). 
and the proof is finished. 


REFLECTION OF HOLDER FUNCTIONS 


As defined in Section 1, 
lule = sup{lu(x) — u(y)I/lx — yl, x #5}, O<s<], 
and the Hélder space BC’"-5(A) consists of the functions in BC™(A) such that 
|l¢lIm,s = max {|lelllns [Diuly: [2] = m} < 0. 


The main theorem is as follows. 


There are constants C,, such that if u © BC™-(A_), then Eu © BC™S(A), and 
EUl|ms S CM ** |\ulllm,s- 


There are constants C,, such that 


[[Petl|m,¢ S Crnlletllmn,s- 


This exercise makes it possible to drop # in the definition of Eu, and to assume 
instead that u vanishes for x, < f(x’) — 2, an assumption that is made from 
now on. The proof begins with a couple of lemmas. 


There is a constant C, depending only on v and n, such that 
|Ryu(x)|5 < CAL 25 |u|. 


Use Lemma 3.4 to prove this lemma. 


LEMMA 
5.3 


Proof 


Proof of Theorem 5.1 


Case 1 


reflection of Holder functions 433 


For each v > | and k > 0, there ts a constant C' such that 
[Ryu(x)| < Cllellz,se(%)***. 
This is the replacement for Lemma 4.7, and it is proved in just about the 


same way. In Taylor’s formula we write the remainder after the term 
of degree k — 1, then add and subtract the term of degree k to get 


k 
u(x’, Xn — Cup) = > Dju(x', Xn Cop) (Co a Cu) 7p3[7! 
j=0 
+ x(x, 2) (6 — Gu) p™, 
r(x, }) a GOLTC® i) ca Dku(x', Xn Cop)){[k! 
with ¢,, between x, — ¢op and x, — ¢,p. Therefore, 


Ira(xs #)| < Slelllans(G — €0)8P(x)*- 
As in the proof of Lemma 4.7, the polynomial terms sum to 0 when put 
in the series for R,, so we have Lemma 5.3 with 


where 


OG > la, |baety +842), 
w=0 

From Theorem 4.4 we know already that Eu © BC™(A) and have an 
evaluation of ||x||m, so all that is needed is an inequality of the form 

|DiRuls <CrMQ*|lullms> |e] = m. (1) 
For the term RD,u in the decomposition of Lemma 4.6 this inequality 
is given by Lemma 5.2. Therefore, what is needed is an inequality of 
the form 


lo(y) RyDpu(y) — 9(x)R,Dpu(x)| < GUO =P 


with 
Gt) — 2D pie: (2) 
The indices satisfy the relations (4) and (5) preceding Lemma 4.5, with 
\:| = m. We assume the notation is chosen so that 
p(x) Se), (3) 


and distinguish two cases. 


p(x) < |x —y|. According to Lemma 5.3 with k = m — |p| we have 
|R,Dpu(x)| < Cllelllm,sp(x)™'P'+s. 
According to Lemma 3.5 we have 
lo(x)| < CMG p(x). 
Combining the two and using the relations between the indices we get 


lp(x)R,Dpu(x)| = CM3|ltullm,6e(*)®- (4) 


434 


17|extension of differentiable functions 


Case 2 


6 


THEOREM 
6.1 


Remark 


Similarly, 

le) R,D,,u(_»)| at CMG llullm,se(9)*- (9) 
Now, p(x) < |x — y|, and by Lemma 3.4 p(y) < (1 + CM2)|x — 9], so 
the inequality (2) is proved. 


p(x) = |x — |. Suppose first that |p| = m. In this case, by virtue of 
the relations between the indices, ¢ = 1, so, by Lemma 3.5, |9| is bounded 
by CM%. We write the left side of (2) in the form 


o(9)(RyDpu(y) — R,Dpu(z)) + (e(9) — 9(+))R,Dpu(x). 
By Lemma 5.2 the first term is bounded in the way required by (2). 
By Lemmas 3.5, 5.3 (with & = 0), the second term is bounded by 


CMG? |\ullm,se(*)* 14 — 9. 


Since p(x) > |x — y| and s < 1, this gives (2). 

Now let |p| <_m. In this case we produce the left side of (2) by 
integrating the directional derivative of pR,D,u along the path described 
just before Lemma 3.4, a path of length < CM,|x — »|. By Lemma 4.5 
any first derivative of pR,D, is a sum of terms of the form 


Te 5 6 Dey et 


with indices satisfying |p’| +g’ = m+1+r’. By Theorem 3.3 and 
Lemma 5.3 any such derivative is therefore bounded by 


CoMB**|lttl\m,s(*) 8 


when account is taken of the relation between the indices. (Recall that 
the vertical distance from the graph to any point on the path is at least 
the vertical distance to x.) These bounds for the derivatives and the 
path length yield (2). 


REFLECTION OF SOBOLEV FUNCTIONS 


There are constants Cp such that if uC Ls(A®) and u €C(A_), then 
Eu€ Lg, (A) and 


|Eull ps & CoM lulls . (1) 


The theorem is true without the restriction u © C™(A_). With an elementary 
background on Sobolev spaces it can be removed immediately. The problem 
is that the functions in £,,(A) are not defined everywhere, but are defined 
more precisely than almost everywhere. In particular, if u€ L$(A°), then by 
the current definition Zu is not defined anywhere on the graph of f, while, 
for s > 4, the Sobolev functions cannot be undefined on that large a set. 
This particular problem is solved by extending slightly the definition of E. 


DEFINITION 
6.2 


Exercise | 


Exercise 2 


reflection of Sobolev functions 435 


Let Ey be the current extension operator. For any locally integrable function u on 


A 


: ] 
ne) = Ni Btw, )| eee (2) 


at any point x where the right side exists. 


If u is locally integrable, then Zou is locally integrable, and Eu = Eu almost 
everywhere, including every point of continuity. Consequently, the extended 
definition is consistent with the old one. This removes the problem with the 
definition, but a few elementary facts are needed to use the new definition. 
With these facts the restriction u @ C™(A_) disappears immediately. 

Since, however, we are assuming u@C™(A_), we know already that 
Eu€ C(A), so all that needs proof is the inequality. As usual, we begin 
with an exercise to get rid of 4, and thenceforth assume that u = 0 for 


Eee oC) ees, 


There are constants C,, such that 


[allan S Crnlleellzi. 


As before, the proof depends on the decomposition in Lemma 4.6, where 
the first term causes no difficulty, and the others require evaluation. As usual, 
the evaluation is based on Taylor’s formula and on the fact that the poly- 
nomial terms disappear when inserted in the series for R, because of the rela- 
tions (7) just below Lemma 4.6. Therefore, the needed estimate is one for the 
remainder. Since it is an integral estimate, we use the integral form of the 
remainder. 


a) =F shalt — a) + elle = ay 
i ea |, eX —rt)a+ rtt)(1 — 7) dr (3) 
ig CS Ir 


Use the integral form of the remainder given in Exercise 5, Section 2, Chapter 5 
to establish this one. 


Because of the form of (3), for a given function v we will have to consider 


Tyr2(%) = 0(', %n — (C1 — 7)eq + rey)p(*)), (4) 
and will need the following estimate for the L* norm. 
IT 72llnt < 2*4 lala. (5) 
To see this, in the integral defining the norm make the change of variable 
=H, In = qe —((l — 1mm + 7e,)0(8). 6) 


430 17/[extension of differentiable functions 


Proof of the Theorem 


The Jacobian is given by 


J = (teu + (1 — 7) 9) Dap(x) — 1 2 3, (7) 


the inequality coming from 


6, 2 oye and Dp(x) > 1 —« > 3. 


The more elementary Theorem 4.5 of Chapter 14 is sufficient for the change of 
variable here, since the transformation (6) is C®. 


For the term RD, w in Lemma 4.6 the inequality (5) gives 
|RDiwller SY aul (Ti rDellze < 2 DY layl |/Dewllce. 
p=0 p=0 
For the others, expansion by Taylor’s formula in the form (3) gives 
Dpu(x', Xn — Cyp(*)) = 
Pll 
at 


b= 
ek J: 


D4Dyulx’, Xn — coplx))(6o ~ ¢,)8p(x)3 (8) 


t= EM, [) (Co — 6,,)¥ p(x) ® 
with 
1 


re(x, w) = (1 — 1)!) | TrDEDyu(l — 7)¥> dr. 


Hélder’s inequality gives 


1 
rales w)® < | IT pDEDyul dr 
0 


For k = |t| — |p|, integration with respect tox and the inequality (5) give 


IIr(x, 14) \lz2 < Qlleel|ze. (9) 


When put in the series for R,, the polynomial terms sum to 0, so we 
have 


R,Dyu(x) = p(x)* » a,(Co — C,)*(—¢,)*re (x, 1) 


w=0 
Because of the standard relation between the indices and the fact that 
k = |t| — |p|, the power of p that dominates the derivatives of p cancels 


with p*, and we have 
[Dap .-- Da pRy»Dyu(x)| < CrMe > la,|(¢,, — ¢o)*ck Ire (x, #)|.- 
w= 
Thus, 
(oe) 
Dap.» - DapRyDytllrs <CpMellallrg, S laul(ey — 60) ef, 


#=0 
which proves the theorem. 


7 


DEFINITION 
7.1 


LEMMA 
7.2 


Proof 


Exercise 1 


LEMMA 
7.3 


Proof 


extension from Lipschitz graph domains 437 


EXTENSION FROM LIPSCHITZ GRAPH DOMAINS 


Although uniform extension theorems can be proved in a more general setting, 
the Lipschitz graph domains and the uniform Lipschitz graph domains defined 
below provide enough generality for most purposes, and they avoid serious 
difficulties in the proofs. Some remarks on the more general setting are given 
at the end. 


An open set is a Lipschitz graph domain if for each point a © @Q. there exist 
a neighborhood G of a, a choice of the coordinates, and a Lipschitz function f on 
R*-" such that 

OUONG i a) NG. 

The problem of defining an operator E which extends differentiable 
functions on Q to differentiable functions on R” is localized and brought back 
to a reflection across a Lipschitz graph by the use of a partition of unity, 
which we proceed to describe. 


Let X be a metric space. If the compact set K is covered by the open sets Gy, . . ., 
Gm, then there are open sets Qy, . . ., Qm covering K and satisfying O, CG). 


Let d(x) = max {d(x, R" — G,}. Since d(x) > 0 onK, there is a positive 
e so that d(x) > « onK. Take 


Q, = {x:d(x, R" — G,) > de}. 
If B is an open ball containing K, the 0, can be taken to lie inside B. 


Let Obe a family of open sets in R". There is a sequence of open sets Q, with the 
same union, such that the closure of each Q., 1s contained in some set in O, and 
such that every infinite subsequence has empty intersection. 


If G is the union, let {K;} be a sequence of compact sets with union G 
and with K;C K?,,, and set Ky = @. Let O; consist of all intersections 
of sets in O with K},,; — Kj. Oy covers Ky, so finitely many of its sets, 
G,, ..., Gm, cover Ky. To these we apply Lemma 7.2 to select cor- 
responding ’s that cover Ky. For j > 1, 0, covers K;,, — K2,,, and we 
apply the same process to get finitely many ’s covering this set, each 
one contained in a set inO;. The totality of Q’s has union G, and since 
sets in O;,, and O; must be disjoint, any infinite set of Q’s must have 
empty intersection. 


If © is a Lipschitz graph domain, then the lemma provides sequences of 
open sets 2, and G, with the following properties. 


438 


17 [extension of differentiable functions 


DEFINITION 
7.4 


Exercise 2 


Remark 


Exercise 3 


Exercise 4 


(i) The Q, cover @Q. 

(li) 0, C G,,. 
(iii) Any infinite set of G,’s has empty intersection. 
(iv) In a suitable coordinate system depending on v 


ONG, = {rite < filx’)} NG, 
where f, is Lipschitz. 


These properties suffice to construct the extension operator E for the 
classes C™(Q). The other classes involve global norm evaluations which require 
some uniformity in the properties. 


A Lipschitz graph domain is uniform tf the ©, and G,, can be chosen so that there 
are positive numbers «, N, and M such that 
(v) The distance from Q., to R" — G, is >. 
(vi) Any N +- 1 G,’s have empty intersection. 
(vii) The Lipschitz constants of the f, are < M. 


Equivalently, an open Q is a uniform Lipschitz graph domain if @Q. can be covered 
by a sequence of open sets G, so that (iv), (vi), and (vii) hold, and in addition each 
point of @Q is at a distance > « from the complement of some G,. 


Prove the equivalence stated in the definition. 


A classical theorem in dimension theory asserts that the Q, in Lemma 7.3 can 
be chosen so that any n + 2 have empty intersection. However, it does not 
provide the required additional properties. 


If F, and F, are disjoint closed sets, then there are open sets Gy Fy and 
G, J F, with G, and G, disjoint. Hint: Gy) = {x:d(x, Fy) < 4d(x, F,)}. 


If Fy and F, are disjoint closed sets, there is a function ¢€C*(R*) with 
0 < d(x) <1 for all x and with ¢ = 0 on a neighborhood of Fy and » = | 
on a neighborhood of F,. Hint: Use the last exercise along with Exercise 7, 
Section 2, Chapter 12 and Exercise 10, Section 6, Chapter 7. 


First we will construct a partition of unity for any Lipschitz graph domain 
Q and use it to produce an operator E that extends functions in C™(Q) to func- 
tions in C™(R"). Then we refine the construction in the case of a uniform 
Lipschitz graph domain so that E extends functions in the other classes too. 

Let {Q,} and {G,} have the properties (i)-(iv). Use Exercise 3 to get an 
open U, DQG, with U, C G,, and let U be the union of the U,. Then 2 — U 
is closed, so we can find an open U, DQ — U with U, CQ. For convenience 
of notation, set Gy = Q. Now use Exercise 4 to produce functions of class C®, 


Remark 


LEMMA 
7.9 


Proof 


LEMMA 
7.6 


Proof 


DEFINITION 
Toll 


THEOREM 
7.8 


extension from Lipschitz graph domains 439 


with values between 0 and 1, as follows: 


%, = | on U,, 
= 0 on a neighborhood of R® — G,, 
y% = 1 ona neighborhood of Q, 
= 0 ona neighborhood of R® — (Q UU). 


At each point of Q U U at least one of the functions is positive, and each 
point has a neighborhood on which all but finitely many are 0. Consequently, 
the functions 


onl) = doled) (auto) (1) 
have the following properties. aS 


(I) ep, © C%(R*) and 9, = 0 outside a closed subset of G,,. 


(II) Each point of R* has a neighborhood on which all but finitely 
many 9, are 0. 


(IIT) 5 9? = | ona neighborhood of Q. 
v=0 


The functions 92 form a partition of unity for Q. 
For v > 1, let A, = {xix, < f,(x’)}. Ifu is a function on Q let 


U(X) = eo(x)u(x) if x © Q, 0 otherwise, 
eno, (e)u(c) io A, OO aE x CA. =o 


If u EG Cm(Q), then uy EG C™(R*). 


If a€ Q, then on a neighborhood of a@ up» coincides with gqz, while if 
a € Q, then on a neighborhood of a tu is 0. 


If u ECG), then u, EG C™(A,). 


If a €G,, then u, coincides with 9,u on the intersection of A, with a 
neighborhood of a, while if a & G,, then u, is 0 on the intersection of A, 
with a neighborhood of a. 


Let E, be an extension operator for A,. If u is a function on , then 
2) 
Eu = PX = > od Oe (2) 
v=1 


Let Q be a Lipschitz graph domain. If u EC(Q), then Eu € C"(R"), and Eu 
is an extension of u. 


440 


17[extension of differentiable functions 


Proof 


LEMMA 
7.9 


Exercise 5 


LEMMA 
7.10 


Proof 


THEOREM 
7.11 


By Lemmas 7.5 and 7.6 and the fact that £, is an extension operator, 

each term in the sum in (2) lies in C™(R"). Since each point has a 

neighborhood on which the sum is finite, it follows that Eu © Cz): 
To show that Ew is an extension of u we show that on Q 


OE uy ou 


and use the relation (III). If x EG, then both sides are 0. If x GG, 
thenx © Q MG, = A, OG,, and 


Eyu,(x) = E,9,(x)u(x) = 9,(x)u(x). 


A corresponding result for the other function classes requires norm 
evaluations, which in turn require uniform bounds on the derivatives of the 
~,, the Lipschitz constants of the f,, and the number of G, with nonempty 
intersection. 


There are constants Ky, such that if E, and E, are measurable sets 
with d(Ey, Ey) >, then there is a function €C?(R) with 
0 < p(x) <1, & = 0 on the set {x:d(x, Ey) < €/3}, » = 1 on the set 
{x:d(x, E,) < €/3}, and 

Illm < Kem. 


Prove the lemma by expanding on the proof of Lemma 12.2, Chapter 13. 


Tf Q ts a untform Lipschitz graph domain, then the , can be chosen so that 
llevllm << Kn, 
where the K,, are independent of v. 


Take U, = {x:d(x, 0,) < ¢/3}. The distances from Q, to R*—G,, 
Q — U to R* — QO, and Q to R* — (Q UU) are at least e, ¢/3, and e/3, 
respectively. By Lemma 7.9, %, and y% can be chosen with bounds of the 
kind specified. Let 9 be the sum of the squares of the ys, so that 9, = 
Ye"1/?, Since the sum defining 9 has at most N nonzero terms at each 
point, g also has bounds of the kind specified. Any derivative of o-}/2 
of order < m is a sum of products of derivatives of » of orders < m and 
negative powers of » of degree at most m+ 4. The result therefore 
follows from the fact that p > 1 wherever ¢ is + 0. 


Let Q be a uniform Lipschitz graph domain. If u lies in one of the spaces 
BC™(Q), BC™8(Q), or LAn(Q), then Eu lies in the corresponding space on R", and 


||Eul| < Crnllell; 


where the norm is the corresponding norm and the constants C',, depend only on Q. 


Proof 


Remark 


extension from Lipschitz graph domains 441 


We have 
D,Eu — Di@ulo oP » Dio, Eu, (3) 
v=1 


From the fact that the extension operators Z, have uniform bounds and 
Lemma 7.10 we get 


|D; pH u,(x) | = Cra tellers 


and a similar inequality for D,pguy(x). For any given x there are at most 
N-+ 1 nonzero terms. Therefore, 


|Exllm << (N + L)CnK mllellln- 


An almost identical argument works for BC™8, but a little more is 
needed in the Sobolev case. 

If {a,} is a sequence with at most N + | nonzero terms, then Hoélder’s 
inequality gives 


(Y lal)? <(N-+ LP layle. 
Consequently, at an point we have ; 
[Debiul? < (N+ 1)?4(\Degerol? +S |Der Bl). 
Therefore, 
|Zulgae) < (N + 1)? (llequllzgne + Spot vtllzie). 


From the fact that the extension operators Z, have uniform bounds and 
Lemma 7.10 we have 


lp Ew y[ln20R" = Cin K milo ||z2@)- 


Now, ||e,z||Zaq) is a sum of terms of the form 
Djo.|?\Dyul dx, jl + kl <m, 
a 
so the result follows from the fact that 


>, |D;@ (x) |P = Ne 


The extension procedure described here is quite different from Whitney’s, 
both in its nature and in the situations to which it applies. For example, the 
Whitney extension does not apply to the Sobolev spaces, and in the other cases 
it provides increasingly complex extension operators as the order of differen- 
tiability increases. On the other hand, it does apply to more general domains 
than the Lipschitz graph domains. 

At least for the Sobolev spaces, a uniform extension operator does exist 
for domains that are more general, though not as general as Whitney’s. They 
are domains with a sort of polyhedral structure in which the cells are uniform 
Lipschitz graph domains. When there are a finite number of bounded cells, 


442 


17[extension of differentiable functions 


the condition for extension is that the cells meet nontangentially and that the 
boundary of 2 does not separate Q locally. This result appeared first in an 
article of Adams, Aronszajn, and Smith in University of Kansas Technical 
Reports, New Series, No. 8, 1964; and subsequently in the Annales de L’ Institut 
Fourier, 1967. The Lipschitz graph case was presented (with a similar but not 
identical proof) by E. M. Stein in lectures at the University of Paris, Orsay, 
during 1966-1967. Stein’s proof is given in his book Singular Integrals and 
Differentiability Properties of Functions (Princeton University Press, 1970). The first 
theorems on the extension of Sobolev functions across Lipschitz graphs were 
given by A. P. Calderén. Calderdén’s method works for an arbitrary but fixed 
finite order of differentiability and for norms in which singular integral opera- 
tors are bounded, e.g. LP? with 1 < p< o. 


Index 


|Aj, 294 

|Alm, 372 

am, 372 

absolute value, 127, 150 
absolutely continuous, 351, 362, 363, 367, 383, 386 
absolute value on Y&,,,, 205 
acceleration, 9, 166, 169 
Adams, R., 424, 442 
addition, 123, 196 

adjoint, 200 

BiG BIS 

affine independence, 185 
algebraic curve, 249 
almost everywhere, 313 
analytic, 107 

analytic continuation, 253 
angle, 25, 167 

arccos, 38 

arc length, 67, 69, 174, 389 
arcsin, 38 

area, 11, 50, 52, 376 

area measure, 372 
argument, 126 

Aronszajn, N., 424, 442 


(I), 130 

ball, 134 

basis, 181 

BC™(A), 419 

BC™S(A), 419 

Brouwer degree, 402, 403, 406 
Brouwer fixed-point theorem, 409 


443 


c™, 280 

Co, 336 

c™(A), 418 

(I), 131 

Calderon, A. P., 442 

Cantor function, 351 
Cauchy-Schwarz inequality, 127 
Cauchy sequence, 95, 130 
center of mass, 173 
centrifugal force, 169 

chain rule, 29, 231 

change of variable, 60, 367, 411 
characteristic function, 314 
chord, 3, 165, 263 

class C, 234, 268, 280, 281, 284 
class C°, 107 

closed set, 134 

closure, 137 

compact, 145 

comparison test, 94 
complete, 131 

complex multiplication, 125 
complex numbers, 125 
composite function, 29, 143 
connected, 138 

connected component, 142 
continuous, 12, 24, 129 
converge, 89, 90, 129 
converge absolutely, 96, 339 
convolution, 333 
coordinate, 182 

coordinate function, 163 


444. index 


countable, 297 

counting measure, 301 
Cramer’s rule, 222 

critical point, 342, 345, 389 
critical value, 342, 345, 389 


d(x, A), 137 

Dv, Dv, D*v, D,v, 355 

a, F, 245 

deg f (degree), 402, 403, 406 
deg(f;_v), 398 

derivative, 5, 18, 161, 350, 355 
determinant, 203, 217, 219 
diameter, 147 


differentiable, 18, 161, 225, 268, 350, 355 


differential, 225, 268 

dimension, 179, 255, 256 

direction, 159 

directional derivative, 223 
discriminant, 277 

distance, 129 

diverge, 89, 90 

dominated convergence theorem, 318 


e(p), 217 

Egoroff’s Theorem on convergence, 368 
eigenvalue, 208 

eigenvector, 208 

equivalent absolute values, 132, 150 
equivalent metrics, 132 

exponential, 32, 113 


extension from Lipschitz graph domains, 437 


extension operator, 429, 438 


JS ~ & 404, 406 

F,, 307 

face, 375 

Fatou’s lemma, 316 

final point, 139 

fixed point, 236 

Fourier coefficients, 19] A 
Fourier series, 19] 

Fubini’s theorem, 322, 323 
function, 36 

functional dependence, 347 


fundamental theo.cm of algebra, 409 
fundamental theorem of calculus, 56 


G5, 307 

gamma function (I), 320 
gradient, 227, 269 
graph, 195, 255 
gravitation, 169 

greatest lower bound, 34 


half-life, 159 
Hausdorff measure, 305 
Hestenes reflection, 421 
Holder space, 419, 432 
homeomorphic, 410 
homogeneous, 286 
homotopic, 404, 406 
’Hospital’s rule, 46 


I(p), 312 

identity, 196 

implicit-function theorem, 245, 282 
improper integral, 77, 78 
indefinite integral, 349 

infimum (inf), 35 

initial point, 139, 159 

inner product, 123, 124 
integrable, 317 

integrable (Riemann), 54, 162 
integral, 54, 313, 317, 318 

integral over A, 314 

integration by parts, 63, 65 
interval, 35 

inverse, 37, 196, 198 
inverse-function theorem, 238, 282, 408 


Jacobi matrix, 227 
Jacobian, 221, 373, 378 


K,, 308 
Kepler’s laws, 173 
Kirszbraun Extension Theorem, 417 


LP(A), 419 
Pins 205 


Lagrange multipliers, 269, 270 

least upper bound, 34 

Lebesgue integral, 313, 318 
Lebesgue measure, 294 

Lebesgue point, 363 

left-hand limit, 17 

length of a curve, 67, 69 
Lichtenstein reflection, 420 

limit, 5, 16 

limit 00, 47 

limit inferior, 40, 96 

limit of a sequence, 89 

limit superior, 40, 94 

line, 158 

line segment, 140 

linear combination, 180 

linear equations, 194, 203 

linear transformation, 192 

linearly dependent (independent), 180 
Lipschitz graph domain, 419, 437 
Lipschitz graph domain (uniform), 438 
Lipschitz space, 419 

Lipschitzian (locally), 383 

local inverse, 238 

local max and min, 45, 87, 270, 286 
local parametric representation, 256 


local parametric representation by arc length, 264 


logarithm, 31, 113 
lower Riemann sum, 53 
lower semicontinuous, 149 


LE, 304 

manifold, 255 

matrix, 193, 194, 195 

maximum, 7 

mean-value theorem, 44, 234 
mean-value thcorem for integrals, 86 
measurable function, 309 
measurable set, 301 

measure 0 locally, 298 

metric space, 129 

minimax, 210 

minimum, 7 

mollifying, 336 

monotone convergence theorem, 315 
multiple series, 339 


e 


index 


N(E; y), 376 

Newton's laws, 9, 169 

norm (= absolute value), 127, 205 
normal, 263 

nowhere dense, 137 

null space, 198 


(O)., Zia 

one-dimensional manifold, 264-267 
one to one, 36, 37 

open set, 134 

oriented line, 160 

orthogonal, 167 

orthogonal complement, 188 
orthogonal transformation, 212 
orthonormal, 186 

outer measure, 301 


parallel subspace, 184 
parametric equations, 69 
parametric representation by arc length, 176 
partial derivative, 223, 279, 282 
partial sum, 92 

partition, 50, 66 

partition of unity, 437, 439 
path, 139 

path connected, 139 
permutation, 97, 216 
perpendicular, 167 

plane, 183 

point of density, 363 
pointwise convergence, 99 
polar coordinates, 71, 392 
polygon, 67 

polyhedron, 375 
polynomial, 85 

positive definite, 211 

power series, 103, 340 
primitive, 57 

product,.125, 151, 196, 197 
product measure, 321, 322 
projection, 195, 212 


quadratic form, 272 
quadric surface, 273 


HS 


446 index 


R", 123 
R(I), 131 


Rademacher’s Theorem on change of variable, 414 
Rademacher’s Theorem on differentiability, 368 


radius of convergence, 103, 105 
range, 36, 198 

rank, 202 

ratio test, 106 

rectangle, 140, 294 

refinement, 53 

reflection across hyperplanes, 420 
reflection across Lipschitz graphs, 428 
reflection of Hélder functions, 432 
reflection of Sobolev functions, 434 
regular Borel measure, 348 

regular point, 255, 345, 389 
regular value, 341, 345, 389 
regularized distance, 424 
regularizing, 336 

retraction, 409 

Riemann integrable, 54, 162 
Riemann sums, 67 

right-hand limit, 18 

rigid motion, 214 


[S], 179 

o-finite, 322 

Sard’s theorem, 342, 345, 347, 389 
scalar multiplication, 123, 196 
section, 322, 323 

Seeley reflection, 421 

segment, 140 

self-adjoint, 208 

sequence, 89, 339 

series, 90, 339 

sign of a permutation, 217 

simple function, 312 

simplex, 373 

slope, 3 

smooth manifold, 255, 284 
Sobolev space, 419 

span, 179 . 
specd, 166 

sphere, 134 


square root, 211 

Stein, E., 442 

Stone-Weierstrass approximation thcorem, 152 
subsequence, 143 

subspace, 178 

sum, 196 

supremum (sup), 35 

surface, 255, 259 


tangent line, 5, 164 

tangent plane, 261 

tangent space, 261 

Taylor polynomial, 80, 285, 341 
Taylor series, 91, 341 

Taylor’s formula, 80, 81, 86, 285 
Tietze Extension Theorem, 416 
torus, 259, 271, 345 

totally bounded, 146 

translate, 183 

transposition, 216 

triangle inequality, 129 
trigonometric functions, 25, 115 
trigonometric polynomial, 153, 157 


uniform continuity, 38 
uniform convergence, 99 
upper Riemann sum, 53 


Vandermonde matrix, 421 
variation, 384 

vector, 167 

vector field, 407 

velocity, 9, 166 

Vitali covering theorem, 356 
volume, 74 


Weierstrass approximation thcorem, 117 

Weierstrass M test, 103 

Whitney Extension Theorems, 418 

Whitney property, 418 

Whitney’s formula for extending Lipschitz functions, 
414, 417 

work, 14 


Undergraduate Texts in Mathematics 


Malitz: Introduction to Mathematical 
Logic: Set Theory - Computable 
Functions - Model Theory. 

1979. xii, 198 pages. 2 illus. 


Martin: The Foundations of Geometry 
and the Non-Euclidean Plane. 
1975. xvi, 509 pages. 263 illus. 


Martin: Transformation Geometry: An 
Introduction to Symmetry. 
1982. xii, 237 pages. 209 illus. 


Millman/Parker: Geometry: A Metric 
Approach with Models. 
1981. viii, 355 pages. 259 illus. 


Prenowitz/Jantosciak: Join Geometrics: 
A Theory of Convex Set and Linear 
Geometry. 

1979. xxii, 534 pages. 404 illus. 


Priestly: Calculus: An Historical 
Approach. 
1979, xvii, 448 pages. 335 illus. 


Protter/Morrey: A First Course in Real 
Analysis. 
1977. xii, 507 pages. 135 illus. 


Ross: Elementary Analysis: The Theory 
of Calculus. 
1980. viii, 264 pages. 34 illus. 


continued from li 


Sigler: Algebra. 
1976. xii, 419 pages. 27 illus. 


Simmonds: A Brief on Tensor 
Analysis. 
1982. xi, 92 pages. 28 illus. 


Singer/Thorpe: Lecture Notes on 
Elementary Topology and Geometry. 
1976, viii, 232 pages. 109 illus. 


Smith: Linear Algebra. 
1978. vii, 280 pages. 21 illus. 


Smith: Primer of Modern Analysis 
1983. xiii, 442 pages. 45 illus. 


Thorpe: Elementary Topics in Differential 
Geometry. 
1979. xvii, 253 pages. 126 illus. 


Troutman: Variational Calculus 
with Elementary Convexity. 
1983. xiv, 364 pages. 73 illus. 


Whyburn/ Duda: Dynamic Topology. 
1979. xiv, 338 pages. 20 illus. 


Wilson: Much Ado About Calculus: 

A Modern Treatment with Applications 
Prepared for Use with the Computer. 
1979, xvii, 788 pages. 145 illus. 


This edition is a comprehensive introduction to the basic ideas of 
modern mathematical analysis. Coverage proceeds from the elemen- 
tary level to advanced and research levels. Additions to this edition 
include Rademacher’ theorem on differentiability of Lipschitz functions, 
deeper formulas on change of variables in multiple integrals, and re- 
cent results on the extension of differentiable functions. 


ISBN 0-387-90797-1 
ISBN 3-540-90797-1 


